Skip to content

feat(deploy): added deploy wizard#260

Open
wyattjoh wants to merge 45 commits into
mainfrom
wyattjoh/deploy
Open

feat(deploy): added deploy wizard#260
wyattjoh wants to merge 45 commits into
mainfrom
wyattjoh/deploy

Conversation

@wyattjoh
Copy link
Copy Markdown
Contributor

@wyattjoh wyattjoh commented May 6, 2026

Summary

Adds clerk deploy, an interactive wizard that promotes a linked Clerk application from development to production. The wizard validates clone compatibility, creates the production instance and primary domain, walks the user through the returned CNAME records (with optional BIND zone export to ./clerk-<domain>.zone), verifies DNS/SSL/mail with a per-component spinner chain, and collects production OAuth credentials for Google, GitHub, Microsoft, Apple, and Linear.

State is derived from the Platform API on each run, not persisted locally. The wizard reads the live application, domains, deploy status, and instance config on entry and resumes from the next pending step. The only profile write is instances.production once the production instance exists. Pressing Ctrl-C or picking "Skip" at an OAuth provider exits with Deploy paused and exit code 1; rerunning resumes from whatever the API now reports.

Platform API errors specific to the deploy lifecycle are mapped to typed CliErrors in commands/deploy/errors.ts (unsupported plan features, provider/taken domains, production_instance_exists recovery, SSL/mail retry throttles). The wizard refuses to start in agent mode because production configuration is interactive.

Test plan

  • bun run format:check
  • bun run lint
  • bun run typecheck
  • bun run test
  • bun run test:e2e:op
  • Manual smoke: unlinked directory exits with error: No Clerk project linked; linked directory walks through create → DNS records → BIND export → verification → OAuth → production summary; Ctrl-C at OAuth exits with Deploy paused and exit code 1; rerun resumes at the skipped provider.

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 6, 2026

🦋 Changeset detected

Latest commit: fa5821c

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
clerk Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@wyattjoh
Copy link
Copy Markdown
Contributor Author

!snapshot

@github-actions
Copy link
Copy Markdown
Contributor

Snapshot published

npm install -g clerk@1.3.1-snapshot.0dd30be
Package Version
clerk 1.3.1-snapshot.0dd30be

Published from 0dd30be

@wyattjoh wyattjoh changed the base branch from main to wyattjoh/api-error-classification May 13, 2026 22:30
@wyattjoh
Copy link
Copy Markdown
Contributor Author

wyattjoh commented May 13, 2026

Stack: wyattjoh/deploy

Part of a stacked-prs chain. Do not merge manually.

@wyattjoh wyattjoh force-pushed the wyattjoh/api-error-classification branch from 93d7a24 to bf05f3e Compare May 19, 2026 13:55
@wyattjoh wyattjoh force-pushed the wyattjoh/api-error-classification branch from bf05f3e to ffb869f Compare May 21, 2026 06:40
@wyattjoh wyattjoh force-pushed the wyattjoh/deploy branch 2 times, most recently from a3d8b46 to 7e6819c Compare May 21, 2026 15:26
@wyattjoh wyattjoh force-pushed the wyattjoh/api-error-classification branch from ffb869f to 162f269 Compare May 21, 2026 15:26
Base automatically changed from wyattjoh/api-error-classification to main May 21, 2026 19:21
wyattjoh and others added 13 commits May 21, 2026 13:22
- Preserve completed providers when pausing OAuth setup mid-loop, so
  `clerk deploy --continue` can finish multi-provider stacks.
- Surface a warning for OAuth providers enabled in dev that the wizard
  does not yet support, instead of silently skipping them.
- Close the gutter as Paused (not Failed) when DNS verification times
  out, since the state is recoverable via --continue.
- Tighten the production-domain regex to reject malformed inputs like
  example..com or example-.com before they reach the API.
Move deploy lifecycle endpoint wrappers into the shared PLAPI client while routing the deploy wizard through a command-local adapter that defaults to mocked operations until the backend endpoints are ready.
Switch the deploy command from the in-process mock lifecycle to the live
PLAPI endpoints. Add a typed error mapper that translates known PLAPI
failures (plan_insufficient, home_url_taken, ssl_retry_throttled, etc.)
into CliError with stable codes, with a recovery path for the
production_instance_exists case so the wizard re-derives state instead
of surfacing the error.

Surface per-component progress (DNS / SSL / mail) during deploy_status
polling, and drop the four hidden --test-fail-* CLI options now that
failure injection is routed through the test's module mock.

Collapse the api/mock indirection layer to a thin re-export of the
plapi endpoints + a no-op configureMockDeployApi stub for the test
seam, and strip the dead mockDeployApi implementation that the
indirection used to back.
Swap the "Continue to OAuth setup?" yes/no prompt during initial
production setup for the same verify/skip select used when resuming.
Choosing skip records DNS as pending and continues to OAuth instead of
pausing the deploy, so the dashboard remains the single place to monitor
propagation.
- Tolerate getDeployStatus failures inside the reconcile-path snapshot
  read so the deploy continues to the verify-or-skip prompt instead of
  failing before the user can react.
- Split the snapshot fetch into separate "Reading development
  configuration" and "Reading production configuration" spinners so the
  gutter reflects what is actually being loaded.
- Add triggerDomainDnsCheck (POST .../dns_check) and call it best-effort
  when the user picks "Check DNS now" so an active check job is kicked
  rather than waiting on background reconciliation.
- Type the dns_ok/ssl_ok/mail_ok booleans on DeployStatusResponse and use
  them to name the specific pending component in the timeout warning.
Switch deploy command imports back to lib/plapi.ts directly and remove the
./mock.ts harness plus the --test-force-* / --test-fail-* CLI flags it
backed. The wrappers were added for an earlier mockable-deploy-API
experiment that is no longer needed.
@wyattjoh wyattjoh marked this pull request as draft May 21, 2026 19:59
wyattjoh added 2 commits May 21, 2026 14:48
The Domain Connect URL helper returned Cloudflare's template for every
domain regardless of the actual registrar — a misleading prompt for any
user not on Cloudflare. Remove it entirely until NS-based registrar
detection lands as its own change.

Other cleanups from the same review pass: runDeploy throws a NOT_LINKED
CliError on unlinked directories instead of warning and exiting zero;
startNewDeploy's 409 production_instance_exists recovery now persists
the recovered production instance id to the profile; OAuth skip routes
through deployPausedError so its exit code matches Ctrl-C (1 instead of
silent 0); runDnsVerification loops on timeout retry instead of
recursing; and runOAuthSetup drops the redundant startIndex slice since
the completed set already skips previously-saved providers.
@wyattjoh wyattjoh changed the title feat(deploy): implement resumable deploy wizard feat(deploy): added deploy wizard May 21, 2026
@wyattjoh wyattjoh marked this pull request as ready for review May 27, 2026 15:31
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
scripts/cleanup-test-users.test.ts (1)

11-28: ⚡ Quick win

Consider simplifying the mock pattern.

The current mock reads its own call count via mock.calls.length, which creates a circular type reference requiring the complex casting chain. A simpler approach uses a closure variable:

♻️ Simpler mock pattern
  test("retries transient Platform API failures before returning JSON", async () => {
+   let attempts = 0;
    globalThis.fetch = mock(async () => {
-     const attempts = (globalThis.fetch as unknown as ReturnType<typeof mock>).mock.calls.length;
+     attempts++;
      if (attempts === 1) {
        return new Response("temporary failure", { status: 500 });
      }

      return Response.json({ instances: [] });
-   }) as unknown as typeof fetch;
+   }) as typeof fetch;

    const result = await plapiGet("/v1/platform/applications/app_123", "ak_test", {
      delayMs: 0,
    });

    expect(result).toEqual({ instances: [] });
    expect(globalThis.fetch).toHaveBeenCalledTimes(2);
  });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/cleanup-test-users.test.ts` around lines 11 - 28, The mock for
globalThis.fetch is overcomplicated by reading its own mock.calls.length
(causing the cast chain); change the mock in the test "retries transient
Platform API failures before returning JSON" to use a closure-scoped counter
variable (e.g., let attempts = 0) that you increment each invocation inside the
mock instead of querying (globalThis.fetch as ...).mock.calls.length, and keep
the existing behavior of returning a 500 on the first call and Response.json({
instances: [] }) thereafter so plapiGet is exercised and the expectation on call
count still holds.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/cli-core/src/commands/deploy/prompts.ts`:
- Around line 64-72: The retry prompt in chooseDnsVerificationRetryAction
currently lists "Skip DNS verification for now" first so Enter defaults to skip;
update chooseDnsVerificationRetryAction (returning DnsVerificationAction) to
make "Check again" the safe default by either placing the { name: "Check again",
value: "check" } choice first in the choices array and/or explicitly setting the
select helper's default/initial value to "check" (use the select API's
initial/default property for this prompt if available) so pressing Enter
triggers "check" not "skip".

---

Nitpick comments:
In `@scripts/cleanup-test-users.test.ts`:
- Around line 11-28: The mock for globalThis.fetch is overcomplicated by reading
its own mock.calls.length (causing the cast chain); change the mock in the test
"retries transient Platform API failures before returning JSON" to use a
closure-scoped counter variable (e.g., let attempts = 0) that you increment each
invocation inside the mock instead of querying (globalThis.fetch as
...).mock.calls.length, and keep the existing behavior of returning a 500 on the
first call and Response.json({ instances: [] }) thereafter so plapiGet is
exercised and the expectation on call count still holds.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 20ba068c-6400-4d41-a9d3-8f51af568904

📥 Commits

Reviewing files that changed from the base of the PR and between f65cea4 and c735a0b.

📒 Files selected for processing (21)
  • .changeset/deploy-wizard.md
  • .claude/rules/logging.md
  • packages/cli-core/src/cli-program.test.ts
  • packages/cli-core/src/cli-program.ts
  • packages/cli-core/src/commands/auth/login.ts
  • packages/cli-core/src/commands/deploy/README.md
  • packages/cli-core/src/commands/deploy/copy.test.ts
  • packages/cli-core/src/commands/deploy/copy.ts
  • packages/cli-core/src/commands/deploy/errors.ts
  • packages/cli-core/src/commands/deploy/index.test.ts
  • packages/cli-core/src/commands/deploy/index.ts
  • packages/cli-core/src/commands/deploy/prompts.ts
  • packages/cli-core/src/commands/deploy/providers.test.ts
  • packages/cli-core/src/commands/deploy/providers.ts
  • packages/cli-core/src/lib/errors.ts
  • packages/cli-core/src/lib/plapi.test.ts
  • packages/cli-core/src/lib/plapi.ts
  • packages/cli-core/src/lib/spinner.test.ts
  • packages/cli-core/src/lib/spinner.ts
  • scripts/cleanup-test-users.test.ts
  • scripts/cleanup-test-users.ts
💤 Files with no reviewable changes (2)
  • packages/cli-core/src/lib/errors.ts
  • packages/cli-core/src/commands/deploy/errors.ts
✅ Files skipped from review due to trivial changes (3)
  • .claude/rules/logging.md
  • packages/cli-core/src/commands/auth/login.ts
  • .changeset/deploy-wizard.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/cli-core/src/commands/deploy/copy.ts

Comment thread packages/cli-core/src/commands/deploy/prompts.ts
@wyattjoh wyattjoh requested a review from rafa-thayto May 27, 2026 20:47
Comment on lines +652 to +655
let retriesRemaining = DEPLOY_STATUS_MAX_RETRIES;
let nextRetryDelay = DEPLOY_STATUS_INITIAL_RETRY_DELAY_MS;

for (const component of DEPLOY_COMPONENT_ORDER) {
Copy link
Copy Markdown
Contributor

@rafa-thayto rafa-thayto May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retriesRemaining and nextRetryDelay live outside the component loop, so all three spinners share one budget. With the constants here (5 retries, 3s base, x2 backoff) that totals ~93s, and whichever component is checked first (mail) can spend all of it. Once it does, the dns and ssl spinners reach while (retriesRemaining > 0) with zero left and return false right away, reporting the domain as unverified even when one more poll would have flipped them. Give each component its own budget:

Suggested change
let retriesRemaining = DEPLOY_STATUS_MAX_RETRIES;
let nextRetryDelay = DEPLOY_STATUS_INITIAL_RETRY_DELAY_MS;
for (const component of DEPLOY_COMPONENT_ORDER) {
for (const component of DEPLOY_COMPONENT_ORDER) {
let retriesRemaining = DEPLOY_STATUS_MAX_RETRIES;
let nextRetryDelay = DEPLOY_STATUS_INITIAL_RETRY_DELAY_MS;

Comment on lines +719 to +721
dns: response.dns?.status === "complete",
ssl: response.ssl ? checkStatusComplete(response.ssl) : false,
mail: checkStatusComplete(response.mail),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three components handle a missing field three different ways. dns reads as pending when absent and ignores required entirely, ssl reads as pending when absent, but mail goes through checkStatusComplete where !undefined?.required is true, so an absent mail field reports as verified without ever being checked. Worth making all three uniform so the absent case means the same thing everywhere:

    dns: checkStatusComplete(response.dns),
    ssl: checkStatusComplete(response.ssl),
    mail: checkStatusComplete(response.mail),

That needs dns?: DomainCheckStatus in plapi.ts and checkStatusComplete taking { status: string; required?: boolean } | undefined. Decide whether an absent component should count as complete or pending and apply it in the one helper rather than per field.

const productionInstanceId = ctx.productionInstanceId;
if (!productionInstanceId) return undefined;

const domain = await loadProductionDomain(ctx);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loadProductionDomain and loadDevelopmentOAuthProviders both take only ctx and don't depend on each other, yet they run one after the other, adding a round trip to every resume run. Only loadProductionState needs domain.id, so the first two can go together:

  const [domain, oauth] = await Promise.all([
    loadProductionDomain(ctx),
    loadDevelopmentOAuthProviders(ctx),
  ]);
  if (!domain) return undefined;
  const { descriptors: oauthProviderDescriptors, unsupported } = oauth;

/**
* Prompt fields for existing deploy callers, falling back to standard OAuth credentials.
*/
export function providerFields(provider: OAuthProvider): OAuthField[] {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These five lookups (providerFields, providerCredentialLabel, providerRedirectLabel, providerSetupCopy, providerGotcha) repeat the same if (hasProviderOverride(provider)) return MAP[provider]; return default shape. A small generic collapses the repetition:

function fromOverride<T>(
  provider: OAuthProvider,
  map: Record<ProviderWithOverride, T>,
  fallback: T,
): T {
  return hasProviderOverride(provider) ? map[provider] : fallback;
}

export function providerFields(provider: OAuthProvider): OAuthField[] {
  return fromOverride(provider, PROVIDER_FIELDS, [
    { key: "client_id", label: "Client ID" },
    { key: "client_secret", label: "Client Secret", secret: true },
  ]);
}
// providerCredentialLabel / providerRedirectLabel / providerSetupCopy / providerGotcha follow the same shape.

Comment thread packages/cli-core/src/commands/deploy/copy.ts
Co-authored-by: Rafael Thayto <rafa.thayto@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants