Onboarding Is a State Machine, Not a Boolean

This is Part 6, the final part of the series Building Multi-Tenant Systems That Match the Real World.

Part 1: Designing Multi-Tenant Backends With Both Ownership and Team Access

Part 2: How to Model Teams Inside a Multi-Tenant Product

Part 3: When "Admin" Means Two Different Things

Part 4: Turning a Permission Model Into a Guard

Part 5: Two Gates — Authorization vs Entitlement

The Boolean That Lies
Onboarding Is the Tenant's Lifecycle
Status, Step, and Per-Step Flags
The Backend Owns Step Order, Not the Frontend
Some Steps Pause, and the State Has to Survive That
A Guard That Gates the Product Until You Are Ready
Letting Onboarding Routes Through Its Own Gate
Resumability Is the Whole Point
What I Would Avoid
Closing Thought

The Boolean That Lies

Most onboarding starts as one column:

interface Organization {
  id: string;
  isOnboarded: boolean;
}

It works for exactly one demo. Then the questions arrive, and every one of them is a question a boolean cannot answer:

The user filled in their profile but never added a product. Are they onboarded?
They got to the payment step, started checkout, and closed the tab. Now what?
They come back two days later. Which screen do you show them?
Support needs to know where a stuck tenant is stuck. The boolean says false. False where?

isOnboarded collapses a process into a single bit. The process has a shape — ordered steps, a current position, things that pause and resume — and a single bit throws all of that away. The first time someone abandons onboarding halfway and comes back, the boolean has no answer, and you end up bolting state on anyway, in a panic, in production.

This is the same lesson from Part 2, where membership needed a lifecycle instead of "in or out." Here it applies to the tenant itself. A team member has states. A tenant being set up has states too.

So model the states.

Onboarding Is the Tenant's Lifecycle

In Part 2, membership moved through PENDING → ACTIVE → SUSPENDED → REMOVED. Onboarding is the same idea pointed at the organization instead of its people.

At the coarse level, a tenant is in one of three states:

enum OnboardingStatus {
  PENDING     = 'pending',      // created, nothing done yet
  IN_PROGRESS = 'in_progress',  // moving through the steps
  COMPLETED   = 'completed',    // ready to use the full product
}

PENDING is the freshly created organization that has done nothing. IN_PROGRESS is the long middle where the real work happens. COMPLETED is the only state that unlocks the rest of the product.

That coarse status answers "can this tenant use the product yet?" But it does not answer "where are they?" — and you need both. The coarse status is for the guard. The fine-grained position is for the user experience. They are two different jobs, and they want two different fields.

Status, Step, and Per-Step Flags

I carry three things, not one. Each answers a different question:

interface Organization {
  id: string;

  // Coarse status — drives the access guard.
  onboardingStatus: OnboardingStatus;

  // Current position — drives "where do I resume?"
  onboardingStep: number;

  // Per-step completion — drives the progress UI and lets steps be idempotent.
  onboardingProfileCompleted: boolean;
  onboardingBrandingCompleted: boolean;
  onboardingFirstItemCreated: boolean;
  onboardingPlanSelected: boolean;
}

Why all three, when they overlap?

onboardingStatus is what the guard reads. It is a single, cheap check: completed or not. The guard does not care about steps.
onboardingStep is the resume pointer. When the user returns, this is the screen you send them to. It is also what support reads to see where someone is stuck.
The per-step booleans are the truth of what is actually done. They make each step idempotent — submitting the profile step twice is safe because the flag, not the step number, records completion — and they drive a progress checklist that does not lie just because the user skipped around.

The redundancy is deliberate. A single integer step looks like enough until a user completes step 4, then edits step 2 again, and you need to know that step 4 is still done. The booleans hold that. The integer alone cannot.

A status endpoint then assembles the full picture for the client:

async function getOnboardingStatus(organizationId: string) {
  const org = await db.organization.findUnique({
    where: { id: organizationId },
    select: {
      onboardingStatus: true,
      onboardingStep: true,
      onboardingProfileCompleted: true,
      onboardingBrandingCompleted: true,
      onboardingFirstItemCreated: true,
      onboardingPlanSelected: true,
    },
  });

  if (!org) throw new NotFoundError('Organization not found');

  return {
    status: org.onboardingStatus,
    currentStep: org.onboardingStep,
    steps: {
      profile: org.onboardingProfileCompleted,
      branding: org.onboardingBrandingCompleted,
      firstItem: org.onboardingFirstItemCreated,
      plan: org.onboardingPlanSelected,
    },
  };
}

The frontend renders a checklist from steps and routes to currentStep. It never has to infer progress from a boolean, because the backend hands it the real state.

The Backend Owns Step Order, Not the Frontend

Here is the rule that keeps onboarding honest: the order of steps is a backend invariant, not a frontend convention.

It is tempting to let the UI drive the sequence — show screen 2, then 3, then 4 — and have each endpoint just save what it is given. That breaks the moment someone hits an endpoint out of order: a replayed request, a deep link, a curious user with the network tab open, a half-finished mobile session. If the backend trusts the frontend's ordering, none of those are safe.

So each step validates its own position before doing anything:

async function assertStep(
  organizationId: string,
  expectedStep: number,
  allowPastSteps = false,
) {
  const org = await db.organization.findUnique({
    where: { id: organizationId },
    select: { onboardingStep: true, onboardingStatus: true },
  });

  if (!org) throw new NotFoundError('Organization not found');

  if (org.onboardingStatus === OnboardingStatus.COMPLETED) {
    throw new BadRequestError('Onboarding is already complete');
  }

  // Exact match for forward progress; allowPastSteps lets users edit
  // a step they have already moved past without breaking the sequence.
  const ok = allowPastSteps
    ? org.onboardingStep >= expectedStep
    : org.onboardingStep === expectedStep;

  if (!ok) {
    throw new BadRequestError(
      `Please complete step ${org.onboardingStep} first`,
    );
  }

  return org;
}

Each step handler opens with this and only then does its work:

async function completeProfileStep(organizationId: string, dto: ProfileDto) {
  const org = await assertStep(organizationId, 2, /* allowPastSteps */ true);

  await db.organization.update({
    where: { id: organizationId },
    data: {
      // ...persist the profile fields...
      onboardingProfileCompleted: true,
      onboardingStep: Math.max(org.onboardingStep, 3), // never move backward
      onboardingStatus: OnboardingStatus.IN_PROGRESS,
    },
  });

  return { nextStep: 3, nextStepName: 'Set your branding' };
}

Two details earn their place. Math.max(org.onboardingStep, 3) means advancing the step never regresses it — re-submitting an earlier step cannot drag a further-along tenant backward. And flipping onboardingStatus to IN_PROGRESS on the first real step is what moves the tenant off PENDING the moment they actually start.

The allowPastSteps flag is the small mercy that makes editing work: a user who already passed step 2 can come back and fix their profile without the backend insisting they are "on the wrong step." Forward motion is strict; revisiting is allowed.

Some Steps Pause, and the State Has to Survive That

Not every step finishes in one request. The clearest example is any step that hands off to an external system — a payment, a verification, a third-party connect flow — and waits for it to come back.

A boolean cannot model "started but not confirmed." A step number can.

The pattern: when a step kicks off an external action, you do not advance. You leave the tenant parked on that step and wait for confirmation to arrive separately.

async function startPlanStep(organizationId: string, dto: PlanDto) {
  await assertStep(organizationId, 5);

  const result = await billing.beginCheckout({ organizationId, plan: dto.plan });

  // If no external payment is needed, the step completes right here.
  if (result.settledImmediately) {
    return completePlanStep(organizationId);
  }

  // Otherwise: do NOT advance. Park on step 5 and hand back the continue-URL.
  return {
    requiresExternalAction: true,
    continueUrl: result.url,
    nextStep: 5,                       // stay put
    nextStepName: 'Complete payment to continue',
  };
}

Confirmation comes back through a separate path — a verification call, a webhook, a return redirect — and that is what advances the step:

async function confirmPlanStep(organizationId: string, reference: string) {
  const settled = await billing.verify(reference);

  if (!settled) {
    // Still pending — leave them parked, tell them honestly.
    return { confirmed: false, nextStep: 5, nextStepName: 'Complete payment' };
  }

  await db.organization.update({
    where: { id: organizationId },
    data: {
      onboardingPlanSelected: true,
      onboardingStep: 6,
      onboardingStatus: OnboardingStatus.COMPLETED, // last step → done
      onboardingCompletedAt: new Date(),
    },
  });

  return { confirmed: true, status: OnboardingStatus.COMPLETED };
}

The state machine survives the gap. If the user closes the tab mid-payment, the tenant is still sitting on step 5 with onboardingPlanSelected still false. They come back, hit the status endpoint, and land exactly where they left off. Nothing was lost, because the pause was a state, not a runtime variable that died when the request ended.

This is the property a boolean can never give you: **a half-finished step is a place you can r

A Guard That Gates the Product Until You Are Ready

Now the payoff. Onboarding state is only useful if it actually controls access to the rest of the product. That is a guard — the same pattern from Part 4, pointed at a different question.

The permission guard asked "is this user allowed?" The onboarding guard asks "is this tenant ready?" They compose: a request can be perfectly authorized and still be blocked because the organization has not finished setup.

@Injectable()
export class OnboardingGuard implements CanActivate {
  constructor(
    private prisma: PrismaService,
    private reflector: Reflector,
  ) {}

  async canActivate(context: ExecutionContext): Promise<boolean> {
    // Some routes are explicitly exempt (see next section).
    const skip = this.reflector.getAllAndOverride<boolean>(
      'skipOnboardingCheck',
      [context.getHandler(), context.getClass()],
    );
    if (skip) return true;

    const request = context.switchToHttp().getRequest();
    const organizationId = request.organizationId;

    // No tenant context here — let other guards decide.
    if (!organizationId) return true;

    const org = await this.prisma.organization.findUnique({
      where: { id: organizationId },
      select: { onboardingStatus: true, onboardingStep: true },
    });

    if (!org) throw new ForbiddenException('Organization not found');

    // The single gate: only COMPLETED tenants reach the rest of the product.
    if (org.onboardingStatus !== OnboardingStatus.COMPLETED) {
      throw new ForbiddenException({
        message: 'Please complete onboarding first',
        onboardingRequired: true,
        currentStep: org.onboardingStep, // tell the client WHERE to resume
      });
    }

    return true;
  }
}

Notice the guard reads only the coarse onboardingStatus for its decision — that is the cheap, single check it was designed for. But it returns currentStep in the error body. The decision needs the status; the client needs to know where to send the user. The guard gives both: a clean block, plus a breadcrumb back into the flow.

That error shape matters. A bare 403 tells the frontend "no." A 403 with onboardingRequired: true and currentStep tells it "no, and here is exactly where to take them instead." The difference between a dead end and a redirect is in that payload.

Letting Onboarding Routes Through Its Own Gate

A guard that blocks everything until onboarding is complete has an obvious problem: it blocks the onboarding routes themselves. You cannot finish onboarding if the guard won't let you call the onboarding endpoints.

So the gate needs explicit exemptions, declared the same way Part 4 declared permission requirements — as metadata on the route:

export const SkipOnboardingCheck = () =>
  Reflect.metadata('skipOnboardingCheck', true);

Three categories of route get exempted, and the discipline is to keep the list small and intentional:

@Controller('onboarding')
@SkipOnboardingCheck()          // the whole onboarding flow is exempt
export class OnboardingController { /* ... */ }

@Controller('auth')
@SkipOnboardingCheck()          // you must be able to log in before onboarding
export class AuthController { /* ... */ }

@Post('items')
@SkipOnboardingCheck()          // creating the FIRST item IS an onboarding step
async createItem() { /* ... */ }

The rule of thumb: exempt a route only if it is needed to complete onboarding, or if it must work before a tenant exists at all. Auth qualifies (you log in first). The onboarding flow qualifies (it is the flow). The "create your first item" endpoint qualifies because that creation is step 4 — blocking it would deadlock the very step it belongs to.

Everything else stays gated. Keeping the exemption list short is what keeps the guarantee meaningful: if a route is not on the list, you know a half-onboarded tenant cannot reach it.

Resumability Is the Whole Point

Step back and the through-line is one property: a user can leave at any moment and come back to exactly where they were.

Every design choice in this article serves that:

Status + step + flags mean the current position is durable data, not session state that dies with the request.
Backend-owned ordering means resuming cannot land them in an inconsistent place, no matter which endpoint they hit first.
Pausable steps mean even a step that waits on an external system has a place to wait, not a hole to fall into.
The status endpoint means the client can always ask "where am I?" and get a real answer.
The guard's currentStep in its error means even an accidental hit on a gated route routes them back into the flow.

None of that is possible with isOnboarded: boolean. The boolean has no position to resume to. The state machine is nothing but positions to resume to.

That is why the shape matters more than the feature. Onboarding looks like a one-time setup wizard, but the data model has to assume it will be interrupted — because in production, it always is.

What I Would Avoid

1. A single `isOnboarded` boolean

It cannot say where a tenant is, cannot survive a paused step, and cannot resume. The first abandoned session breaks it.

2. Letting the frontend own step order

The order of steps is a backend invariant. If the backend trusts the UI's sequence, out-of-order requests corrupt the flow.

3. Advancing a step before it is truly done

A step that hands off to an external system is not complete until confirmation returns. Park on the step; advance only on confirmation.

4. Regressing the step pointer

Re-submitting an earlier step must not drag a further-along tenant backward. Use Math.max so progress only ever moves forward.

5. Gating the routes that complete onboarding

If the guard blocks the onboarding flow, auth, and the first-item step, the tenant deadlocks. Exempt exactly those, and nothing more.

6. A bare 403 from the onboarding gate

Return currentStep so the client can route the user back into the flow. A block without a breadcrumb is a dead end.

Closing Thought

The series began with a single shift: do not model tenants as buckets of data, model them as organizations with boundaries. Every part after that was the same move applied one level deeper.

Part 1 gave the boundary: ownership and team access, not a tenantId column.
Part 2 gave the people inside it a lifecycle, not an in-or-out flag.
Part 3 split authority into two planes, so "admin" stopped being ambiguous.
Part 4 made the permission model enforce itself through one guard.
Part 5 added a second, orthogonal gate: entitlement, separate from authorization.
Part 6 gave the tenant itself a lifecycle — onboarding as state, gated by a guard, resumable by design.

The pattern underneath all six is the same one sentence:

Real systems have states, not booleans — and the backend should own them.

A tenant is not a data bucket. A member is not in-or-out. An admin is not a level. A permission is not an inline check. A plan limit is not a role. And a tenant being set up is not a boolean.

Each of those is a small modeling decision. Made deliberately at the start, they cost a little more thought and save you the rewrite later — the one where you bolt state onto a boolean in production because a user did the obvious thing and came back tomorrow.

That is the whole series in one idea: model the real shape of the thing, and let the backend hold it.

Onboarding Is a State Machine, Not a Boolean

Table of Contents

The Boolean That Lies

Onboarding Is the Tenant's Lifecycle

Status, Step, and Per-Step Flags

The Backend Owns Step Order, Not the Frontend

Some Steps Pause, and the State Has to Survive That

A Guard That Gates the Product Until You Are Ready

Letting Onboarding Routes Through Its Own Gate

Resumability Is the Whole Point

What I Would Avoid

1. A single `isOnboarded` boolean

2. Letting the frontend own step order

3. Advancing a step before it is truly done

4. Regressing the step pointer

5. Gating the routes that complete onboarding

6. A bare 403 from the onboarding gate

Closing Thought

Comments

Building Multi-Tenant Systems That Match the Real World

Designing Multi-Tenant Backends With Both Ownership and Team Access

More from this blog

Two Gates: Authorization vs Entitlement

Turning a Permission Model Into a Guard

When "Admin" Means Two Different Things: Platform vs Organization Authority

How to Model Teams Inside a Multi-Tenant Product

Command Palette

Table of Contents

The Boolean That Lies

Onboarding Is the Tenant's Lifecycle

Status, Step, and Per-Step Flags

The Backend Owns Step Order, Not the Frontend

Some Steps Pause, and the State Has to Survive That

A Guard That Gates the Product Until You Are Ready

Letting Onboarding Routes Through Its Own Gate

Resumability Is the Whole Point

What I Would Avoid

1. A single isOnboarded boolean

2. Letting the frontend own step order

3. Advancing a step before it is truly done

4. Regressing the step pointer

5. Gating the routes that complete onboarding

6. A bare 403 from the onboarding gate

Closing Thought

Comments

Building Multi-Tenant Systems That Match the Real World

Designing Multi-Tenant Backends With Both Ownership and Team Access

More from this blog

1. A single `isOnboarded` boolean