ci: get Hub-Client E2E Playwright Tests running#172
Conversation
b30144c to
a40b38e
Compare
|
Two additions after the initial PR:
The branch is now four commits, with all |
PR #172's first pull_request-triggered run hit a worker-contention timeout in metadata/format-specific/doc-format-overrides.qmd — the preview iframe didn't render within 45s on any of the 3 attempts. Same shape as the failures we saw with 4 workers (cleared by dropping to 2), but now appearing at 2 workers because the runner happened to be slower this time. 8 other smoke-all tests passed only on retry, i.e. they were right at the edge. Bump waitForPreviewRender from 45s to 75s, and the per-test ceiling from 60s to 90s so the wider preview wait can actually fire. Fast tests still pass fast; slower fixtures under contention get the breathing room they need.
5-phase plan to migrate hub-client e2e from `vite dev` to `vite preview` as the real fix for the worker-contention flakes the previous commit papered over with a 75s preview-render timeout. Key motivating numbers (measured on the local dist/): 32 MB wasm_quarto_hub_client_bg.wasm 1.8 MB automerge_wasm_bg.wasm 192 KB web-tree-sitter.wasm ~5 MB dart-sass dynamic-import bundle ~3 MB Monaco editor chunks ---- ~42 MB+ of binary assets served per fresh Playwright browser context `vite dev` serves these uncompressed through a single-threaded plugin pipeline that also has to transform-on-demand the hundreds of TS/JSX modules in the hub-client source tree. With 2 Playwright workers contending for one dev server on a 2-core ubuntu-latest runner, the cold-context page-load tail dominates and randomly exceeds 45s. `vite preview` serves the same content from a prebuilt, hash-named, gzip-compressible dist/ directory with no transform pipeline — should drop ~50 MB of wire traffic by 3-4x and remove the serialization point entirely. Target: e2e workflow under 12 min, Run E2E tests step under 5 min, flaky count ≤ 2, zero hard failures across 3 consecutive runs. Plan is self-contained, scoped to this branch, validated by pushing back to PR #172.
PR #172's first pull_request-triggered run hit a worker-contention timeout in metadata/format-specific/doc-format-overrides.qmd — the preview iframe didn't render within 45s on any of the 3 attempts. Same shape as the failures we saw with 4 workers (cleared by dropping to 2), but now appearing at 2 workers because the runner happened to be slower this time. 8 other smoke-all tests passed only on retry, i.e. they were right at the edge. Bump waitForPreviewRender from 45s to 75s, and the per-test ceiling from 60s to 90s so the wider preview wait can actually fire. Fast tests still pass fast; slower fixtures under contention get the breathing room they need.
b6fed4a to
685a9cb
Compare
The workflow had never actually run on a runner — every push to main since it was added queued for the 24-hour GitHub Actions maximum and was cancelled because `runs-on: ubuntu-latest-8x` isn't a runner label this repo can request. Switching to `ubuntu-latest` unblocked everything else; from there the workflow itself needed several layers of fixes to actually pass. Workflow fixes (hub-client-e2e.yml, with parallel changes to ts-test-suite.yml where applicable): - Runner label: ubuntu-latest-8x → ubuntu-latest. - Build order: build the WASM module before the TypeScript packages, since hub-client's vite build imports the WASM JS glue. - WASM tooling: install wasm-bindgen-cli pinned to the version in Cargo.lock, and add the rust-src component (required by -Zbuild-std=std,panic_unwind in the wasm-quarto-hub-client crate's .cargo/config.toml). - Rust nightly: invoke dtolnay/rust-toolchain@master with no `toolchain:` input so the action reads the pin from rust-toolchain.toml (bd-at72, nightly-2026-04-28 from main). Single source of truth — no duplicate `RUSTUP_TOOLCHAIN` env var. - dtolnay/rust-toolchain action: dated nightlies need @master with a toolchain input, not @nightly-YYYY-MM-DD as a ref. - TypeScript build scoping: only build ts-packages + hub-client. The q2-demos workspaces fail their vite build (vite can't resolve an absolute import in the wasm-bindgen output) and trace-viewer is out of scope. Tracked as bd-1jnb. - Pre-build the hub binary: globalSetup launches it with `cargo run --bin hub` and waits 120s for "Hub server listening" — on a cold runner the cargo compile exceeded that. - Baseline-commit step: switch the broken `**` glob to `find ... -name '*-snapshots' -exec git add -f {} +` (bash globstar isn't enabled by default; visual specs sit directly under hub-client/e2e/, not one directory deeper) and grant `contents: write` so the default GITHUB_TOKEN can push the auto-generated baselines back to the branch. - Add a `pull_request` trigger with the same path filter as `push`, so PRs that touch hub-client or the workflow file have to prove themselves green before merging. Test-side bugs that surfaced once the workflow could actually run: - TS2345 in client.test.ts: installMockRepo used `ReturnType<typeof createMockHandle>` without supplying T, so the parameter defaulted to unknown. Forwarding T through makes the helper generic and the test compiles again. - Node-side IndexedDB shim: projectFactory.ts runs in the Playwright test process (Node, not browser) but createSyncClient instantiates an IndexedDBStorageAdapter unconditionally. Import fake-indexeddb/auto, matching how sync-test-harness already handles vitest. - Vite proxy target: hub-client's vite.config.ts proxies /auth/* and the websocket to VITE_HUB_SERVER (default http://localhost:3000), but globalSetup starts the e2e hub on port 3030 — so vite returned HTTP 500 from /auth/me on every test, blocking the in-browser hub-client from rendering the preview iframe. Pass VITE_HUB_SERVER=http://localhost:3030 to the webServer env. - Functional config picks up visual specs: testDir './e2e' with no testIgnore was finding setup-screens.visual.spec.ts and running it through the functional config, which has no missing-baseline retry. Skip *.visual.spec.ts so those run only via playwright.visual.config.ts. - CI workers: drop from 4 to 2. The 4-workers value was sized for the original ubuntu-latest-8x; under ubuntu-latest (2 cores) random tests stalled in the WASM render pipeline and missed the 45s preview-iframe deadline non-deterministically. Reproduced locally with 4 workers (a different test failed each run); 2 workers cleared all flakes. - Hub-client WASM gap (bd-izfv): the project-render path (RenderToHtmlRenderer) drops the user_grammars provider on the floor, so the smoke-all fixture highlighting/03-user-grammar/03-user-grammar-toml.qmd renders a bare <code> block instead of highlighted TOML. Add a SKIP_WASM_UNSUPPORTED map in smokeAllDiscovery.ts pointing at bd-izfv (which has a complete TDD plan on branch beads/bd-izfv-thread-user-grammars). Also adds the missing Playwright visual regression baselines under hub-client/e2e/ that the visual config expects on first run.
PR #172's first pull_request-triggered run hit a worker-contention timeout in metadata/format-specific/doc-format-overrides.qmd — the preview iframe didn't render within 45s on any of the 3 attempts. Same shape as the failures we saw with 4 workers (cleared by dropping to 2), but now appearing at 2 workers because the runner happened to be slower this time. 8 other smoke-all tests passed only on retry, i.e. they were right at the edge. Bump waitForPreviewRender from 45s to 75s, and the per-test ceiling from 60s to 90s so the wider preview wait can actually fire. Fast tests still pass fast; slower fixtures under contention get the breathing room they need.
Cuts wall time and removes flakes by replacing the dev-mode hub-client
server with a prebuilt bundle served through `vite preview`. The win is
mostly mechanical: a cold Playwright context downloads ~50 MB of assets
(WASM dominates at 32 MB). Through `vite dev` that goes uncompressed and
serializes through a single-threaded transform pipeline; through
`vite preview` with a gzip middleware it's ~5.6 MB on the wire and
served as static files. Under 2 Playwright workers on a 2-core CI
runner this was the source of the "preview iframe didn't render in 45s"
flakes that the previous commit (75s timeout bump) papered over.
CI on chore/e2e-ci across 3 consecutive runs: 0 hard failures, Run E2E
tests step 5.3-8.1 min (was 14.1 min, ~42% faster), flakes 3-7 (was
12). Local smoke-all: 1.9 min / 1 flake → 1.1 min / 0 flakes.
See `claude-notes/plans/2026-05-11-vite-preview-for-e2e-tests.md` for
the 5-phase plan and motivating numbers.
Major pieces
- hub-client/vite.config.ts: mirror `server.proxy` into `preview.proxy`
(preview ignores `server.proxy`). Add a small `configurePreviewServer`
plugin that runs `compression()` middleware; override its filter so
`application/wasm` is included (mime-db marks it non-compressible by
default — in practice it gzips ~6:1).
- hub-client/playwright.config.ts: `webServer.command` → `vite preview`;
comment why it's not `vite dev` so the next agent doesn't "helpfully"
revert it for HMR.
- hub-client/ast-renderer.html: moved from `public/` to project root.
When it lived in `public/` the build emitted two copies — the
transformed one at `dist/public/ast-renderer.html` and a raw copy at
`dist/ast-renderer.html` (with a dev-only `<script src="/src/...tsx">`
reference). The iframe `src="/ast-renderer.html"` hit the raw one in
preview mode and the q2-debug E2E test broke. This was a latent prod
bug — `vite dev` happened to mask it via source-path resolution.
Test-hook plumbing
- src/test-hooks.ts (new): registers
`window.__quartoTest = { projectStorage, wasmRenderer }`. Tree-shaken
out of any build without `VITE_E2E=1`.
- src/main.tsx: kicks off `import('./test-hooks')` and stores the
promise on `window.__quartoTestReady`. Top-level `await` here doesn't
help (the `load` event fires before module top-level awaits resolve),
so tests `await window.__quartoTestReady` before reading the hooks.
- e2e/helpers/testHooks.ts (new): typed global augmentation.
- e2e/helpers/previewExtraction.ts, projectFactory.ts,
share-link-project-set.spec.ts: 5 `await import('/src/services/...ts')`
call sites replaced with `await window.__quartoTestReady` +
`window.__quartoTest.{projectStorage,wasmRenderer}`. The dev-only
source-path imports stopped working under `vite preview` (prod
bundles don't expose source paths).
- package.json: `test:e2e[:ui]` scripts now set `VITE_E2E=1` and build
before running playwright (since preview serves from `dist/`).
- .github/workflows/hub-client-e2e.yml: `VITE_E2E: '1'` on the Build
TypeScript packages step.
Side effects worth knowing
- Running `npx playwright test` directly (bypassing `npm run test:e2e`)
serves whatever was last built into `dist/` — possibly stale. The
npm-script path always rebuilds.
- New devDeps `compression` + `@types/compression` (83 transitive
packages, dev-only, no shipped code). No production runtime change.
- `vite dev`, `vite build` defaults, and the production user bundle are
untouched: the compression plugin only fires under `vite preview`,
and test-hooks is dead-code-eliminated without `VITE_E2E=1`.
685a9cb to
dc9fcd0
Compare
PR #172's first pull_request-triggered run hit a worker-contention timeout in metadata/format-specific/doc-format-overrides.qmd — the preview iframe didn't render within 45s on any of the 3 attempts. Same shape as the failures we saw with 4 workers (cleared by dropping to 2), but now appearing at 2 workers because the runner happened to be slower this time. 8 other smoke-all tests passed only on retry, i.e. they were right at the edge. Bump waitForPreviewRender from 45s to 75s, and the per-test ceiling from 60s to 90s so the wider preview wait can actually fire. Fast tests still pass fast; slower fixtures under contention get the breathing room they need.
Summary
The
Hub-Client E2E Testsworkflow had never actually executed on a runner since it was added — every push tomainqueued for the 24-hour Actions maximum and was cancelled becauseruns-on: ubuntu-latest-8xisn't a runner label this repo can request. This PR gets the workflow running and green on a stockubuntu-latestrunner.Three commits, deliberately split by concern:
ci: enable Hub-Client E2E Tests workflow— every fix needed in.github/workflows/hub-client-e2e.yml: runner label, build ordering, WASM tooling (wasm-bindgen-cli pinned to Cargo.lock, rust-src component, nightly pin to dodge a rustc SIGSEGV on tokio for wasm32), dtolnay action ref form, TS build scoping, hub binary pre-build, and a fix for the broken baseline auto-commit step (glob + permissions).test(e2e): make hub-client e2e suite pass on CI— every fix needed in the test code that the workflow surfaced once it could actually run: a TS2345 in the vitest mock helper, anindexedDBshim in the Playwright Node-side helper, a Vite proxy env to point/auth/*at the e2e hub's port, atestIgnorefor visual specs so they only run via the dedicated config, cutting CI workers from 4 → 2 to match the 2-core runner, and aSKIP_WASM_UNSUPPORTEDentry for the one fixture that exercises a real WASM gap.Add missing Playwright visual regression baselines— 6 auto-generated chromium-linux baselines committed back by the workflow itself on its first successful run; kept as a separate commit bygithub-actions[bot]to preserve provenance.The remaining WASM gap is bd-izfv (Phase 9 follow-up: thread
user_grammarsthroughRenderToHtmlRenderer). A complete TDD plan for that fix is already on branchbeads/bd-izfv-thread-user-grammars. When bd-izfv lands, theSKIP_WASM_UNSUPPORTEDentry added in this PR can be removed and the user-grammar fixture will run; Phase 5 of that plan explicitly calls out the restore step.Filed during this work:
q2-demos/*vite build fails resolving/src/wasm-js-bridge/cache.js— the workflow scopes the TS build to skip them.[TAG_RESOLVE_FAILED] YAMLWarningfrom!strtags in playwright stdout — cosmetic, not a failure cause.Test plan
success, 76 tests passed, 6 baselines auto-committed)npx playwright test --grep "01-builtin-python"in the worktree (5.7s)git diff origin/main HEADreviewed and the 3-commit history is lossless against the pre-squash branch