Skip to content

Boot-test runServeCmd end to end (serve.go ~7% to ~64% coverage)#48261

Open
raju249 wants to merge 4 commits into
fleetdm:mainfrom
raju249:33370-runservecmd-boot-test
Open

Boot-test runServeCmd end to end (serve.go ~7% to ~64% coverage)#48261
raju249 wants to merge 4 commits into
fleetdm:mainfrom
raju249:33370-runservecmd-boot-test

Conversation

@raju249

@raju249 raju249 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Adds an end-to-end boot test for runServeCmd, the main server entry point. This is the coverage milestone for #33370: serve.go goes from ~7% to ~64%, and runServeCmd itself from 0% to ~62%.

The earlier PRs on this issue (#44929, #45343, #45583, #46166, #46421, #46517, #46742, #46830, #46893, #47151, #47562, #47891) extracted testable pieces out of runServeCmd, but the function itself stayed at 0% — it blocks on an OS signal and wires the entire server together, so the only way to cover it is to actually boot it. This PR does that.

TestRunServeCmd (gated behind MYSQL_TEST + REDIS_TEST) boots the full server against a real migrated test MySQL and Redis, waits for /healthz, then cancels the command context to trigger a graceful shutdown. It covers two paths:

  • Full boot with Apple MDM enabled — a 32-byte server private key brings up the Apple MDM protocol services and the host-identity / conditional-access SCEP setup, so the boot exercises the MDM startup path as well as the core wiring, cron schedules, and HTTP server.
  • Fail-fast on bad config — an invalid Redis host-cache configuration (enabled with a non-positive TTL) aborts startup through initFatal and returns rather than serving, covering the Redis-init error path and the nil-pool guard.

Beyond coverage, this doubles as a regression net for the ongoing runServeCmd slicing: a future change that breaks startup now fails this test instead of reaching a release.

One production change, in runServeCmd's shutdown select: it now also watches cmd.Context().Done(). This is inert in production — the root command runs via Execute() (not ExecuteContext()), so cmd.Context() is context.Background() and never cancels. Only the test runs the command with a cancelable context, which is how it shuts the server down without sending a real signal (a SIGTERM would kill the test binary).

A couple of notes for reviewers:

  • The test uses os.Setenv (not t.Setenv) because the MySQL test helper marks the test parallel; the boot scenarios run as serial subtests so the process-global config env doesn't race.
  • runServeCmd registers metrics with the process-global Prometheus registry, which can only happen once per process, so there is a single full boot here; the error-path scenario fails before that registration.
  • The test DB is loaded from a schema dump that doesn't mark every data migration as applied, so the boot runs with FLEET_UPGRADES_ALLOW_MISSING_MIGRATIONS=1.

It adds ~2s to the cmd/fleet (main) test bundle, which is well off the CI critical path.

Related issue: Refs #33370

Checklist for submitter

  • Added/updated automated tests
  • QA'd all new/changed functionality manually (verified locally: boots to /healthz, graceful shutdown, ~64% serve.go coverage)
  • Changes file: not applicable — internal test coverage with no user-visible behavior change

Summary by CodeRabbit

  • Bug Fixes
    • Improved server shutdown handling to stop cleanly when the running command’s context is canceled, not only on OS signals.
    • Added stronger startup validation to fail fast for invalid Redis host-cache configuration (e.g., non-positive TTL).
  • Tests
    • Added an end-to-end test that boots the server against real MySQL/Redis, verifies graceful startup/shutdown, and confirms fast-fail behavior for misconfiguration.

runServeCmd was untested (serve.go was ~7% covered) because it blocks on
an OS signal and wires the whole server together. Add a gated integration
test that boots the full server against a real test MySQL and Redis, waits
for /healthz, and shuts it down gracefully, taking serve.go to ~64% and
runServeCmd from 0% to ~62%.

To let the test trigger a graceful shutdown without sending a process
signal, the shutdown select now also watches cmd.Context().Done(). This is
inert in production: the root command runs via Execute (not ExecuteContext),
so cmd.Context() is context.Background() and never cancels; only tests run
the command with a cancelable context.

The test covers two paths: a full boot with Apple MDM enabled (a server
private key brings up the MDM protocol services and SCEP setup), and a
fail-fast path where an invalid Redis host-cache config aborts startup
through initFatal. Both are gated behind MYSQL_TEST and REDIS_TEST.

Refs fleetdm#33370
@raju249 raju249 requested a review from a team as a code owner June 25, 2026 10:41

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ec92e8a8-801d-43ce-9ffc-89b45b4d26a5

📥 Commits

Reviewing files that changed from the base of the PR and between ca8e11b and c2c461d.

📒 Files selected for processing (1)
  • cmd/fleet/serve_boot_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • cmd/fleet/serve_boot_test.go

Walkthrough

The serve command now treats command-context cancellation as a shutdown trigger in addition to OS signals and fatal database errors. A new end-to-end test file boots the server with MySQL and Redis test infrastructure, configures boot-time environment variables, polls /healthz for readiness, waits for graceful shutdown after context cancellation, and verifies fail-fast handling for invalid Redis host-cache TTL.

Possibly related issues

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the new end-to-end boot test and the associated coverage increase.
Description check ✅ Passed The description covers the new test, related issue, testing, manual QA, and notes the non-user-visible change.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@cmd/fleet/serve_boot_test.go`:
- Around line 20-24: The boot test helpers and scenarios are still mutating
process-wide state after the parent test is parallelized, including env changes
in setBootEnv and swapping the package-global initFatal, which can leak across
other tests. Update the serve boot test flow to avoid global mutation by using a
non-parallel DB setup for this e2e test or by threading config through command
args/config instead of env, and eliminate the initFatal replacement in the
relevant boot scenario helpers.
- Around line 109-117: The health-check polling in waitHealthy uses the default
http.Get client, which can hang on a stalled connection and outlive
assert.Eventually’s timeout. Update waitHealthy to use an http.Client with a
bounded Timeout for the GET request to /healthz, and keep the retry loop logic
unchanged so the health check still polls until the server becomes ready or the
overall wait expires.
- Around line 155-163: The serve boot test currently cancels the context only
after the health/assertion checks, so a failure in those assertions can skip
shutdown and leave the server goroutine/listener running. In the `Test...` flow
around `runServe`, `waitHealthy`, and `waitShutdown`, defer the `cancel()`
immediately after creating the context (before `require.Emptyf`/`require.True`)
so cleanup always runs even if an assertion aborts the subtest.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 525277e2-02d6-4359-be2a-c51bab722efc

📥 Commits

Reviewing files that changed from the base of the PR and between a91d2a2 and 686fea7.

📒 Files selected for processing (2)
  • cmd/fleet/serve.go
  • cmd/fleet/serve_boot_test.go

Comment thread cmd/fleet/serve_boot_test.go Outdated
Comment thread cmd/fleet/serve_boot_test.go
Comment thread cmd/fleet/serve_boot_test.go Outdated
@codecov

codecov Bot commented Jun 25, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 67.79%. Comparing base (a91d2a2) to head (c2c461d).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main   #48261      +/-   ##
==========================================
+ Coverage   67.30%   67.79%   +0.48%     
==========================================
  Files        3660     3661       +1     
  Lines      231442   231509      +67     
  Branches    12155    12155              
==========================================
+ Hits       155762   156940    +1178     
+ Misses      61742    60480    -1262     
- Partials    13938    14089     +151     
Flag Coverage Δ
backend 69.45% <100.00%> (+0.58%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

raju249 added 2 commits June 25, 2026 16:30
- Load the schema directly instead of via CreateMySQLDS so the test is not
  marked parallel. It mutates process-global state (config env and the
  initFatal var), which must not race with other package tests; with the
  test serial these are safe (verified with go test -race), and env is set
  via t.Setenv so it auto-restores.
- Give the /healthz client a bounded timeout so a stalled connection can't
  outlive the poll window.
- Defer the context cancel and shutdown wait so the server is always torn
  down even if an assertion aborts the subtest.
- Use require.NoError for the shutdown error assertion (testifylint).
- Use t.Context() for the fail-fast error-path scenario, which needs no
  manual cancellation (modernize).
@raju249

raju249 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Hello @getvictor @MagnusHJensen - This one MR adds tests for the runServeCmd. Improves the coverage with happy path and a few non happy path tests.

Would you mind taking a look, please? Let me know what do you think!

Thanks!

Fleet's custom gocritic rule disallows constructing http.Client directly;
use fleethttp.NewClient with a timeout option instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant