Release: version packages by github-actions[bot] · Pull Request #121 · tangle-network/browser-agent-driver

github-actions · 2026-06-23T09:34:09Z

This PR was opened by the Changesets release GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated.

Releases

@tangle-network/browser-agent-driver@0.35.0

Minor Changes

#122 b0f74a4 Thanks @drewstone! - The default provider is now credential-aware instead of a hard openai. A bare run (no --provider/--model, no config-file provider) uses OpenAI when OPENAI_API_KEY is set — unchanged for existing users and CI — and otherwise falls back to an available provider (claude-code, which needs no key) rather than failing on a missing OpenAI key. An explicit provider in CLI flags or a config file is always honored, and the default model maps per-provider as before (e.g. gpt-5.4 → sonnet for claude-code). This removes the last place the no-flag path assumed OpenAI; the engine already supported openai/anthropic/google/claude-code/zai for both text and vision.
#124 a2055b2 Thanks @drewstone! - design-audit (reference-grounded): make the redesign engine job-first instead of aesthetic-first. The old engine grounded every page in a world-class exemplar's visual DNA and judged on visual craft, so it regressed functional pages into generic brochures — a docs page lost its table-of-contents and dense reference content for two marketing cards and a hero; an aggregator dropped from 30 items to 9; a status dashboard shed services into spacious cards. The fix:
- Generator (reference/generate/prompt.ts): persona reframed from art director to product designer. New hard rules in priority order — task-first (design for the page's users and the job in its intent) → preserve functional affordances (never delete navigation/ToC/search to look cleaner) → preserve density where it is the value (docs/dashboards/feeds keep their item count) → right-size the intervention (never turn one kind of page into another) → the exemplar is a source of visual craft only, never a structural template.
- Functional contract: a per-page preservation block derived from the page's own measured DNA (navigation-affordance count, layout density, archetype) so "keep what works" is concrete and data-driven, not exhortation — and density is required only when the page is actually measured dense, so a genuinely sparse page is never forced to stay dense.
- Ranker/judge (reference/judge/prompt.ts): scores task fitness and functional preservation BEFORE visual craft; a polished direction that removes navigation or reduces density loses. "Fit to the reference" counts only as visual craft.
Validated by re-running the regressed pages: docs now keeps its ToC + prev/next nav + dense code examples; HN keeps all 30 stories + nav; the status dashboard stays a dense service grid with real values. No provider coupling; flag-gated reference engine only.

Patch Changes

#123 20942c2 Thanks @drewstone! - design-audit (reference-grounded): enforce content fidelity so a redesign never fabricates content the page lacks. On a content-sparse page grounded against a dense exemplar, the generator would invent factual content to fill the layout (e.g. a placeholder page gaining a fake "Recent Activity" feed with timestamps, invented status/RFC/registry data), and the pairwise direction-ranker rewarded that invented density as "richer" — so applied to a real app the audit could inject fabricated data into the UI. Now the generator may restyle/regroup/re-rank only the page's real content (the exemplar governs how it looks, never what content it has; a sparse page stays proportionally restrained), the ranker penalises invented content as unfaithful instead of rewarding it, and the apply prompt carries a defense-in-depth "do not invent content" guardrail. No provider coupling.
#120 f11b899 Thanks @drewstone! - design-audit (reference-grounded): make redesign generation work with reasoning models. The generator capped output at 2200 tokens, which a reasoning model (e.g. GLM-5.2, o-series) spends on its thinking before the answer — so the JSON direction came back empty or truncated and the audit fell back with a misleading "no JSON object found". Raise the per-direction budget to 8000 (non-reasoning models stop at the closing brace and never use the extra, so it's free for them), and report empty vs truncated vs non-JSON output distinctly so a budget/limit issue is diagnosable. No coupling to any one provider — the engine already runs on openai/anthropic/google/claude-code/zai.

tangletools

✅ Auto-approved PR — `70dd270f`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-23T09:34:16Z}

tangletools

✅ Auto-approved PR — `1f81234d`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-23T10:44:42Z}

tangletools

🟢 Value Audit — sound


Verdict	sound
Concerns	0 (none)
Heuristic	0.0s
Duplication	0.0s
Interrogation	241.9s (2 bridge agents)
Total	241.9s

💰 Value — sound

Standard Changesets release PR: bumps 0.34.0 → 0.35.0 with a correct minor (credential-aware provider, #122) + patch (reasoning-token headroom, #120); merges to trigger OIDC npm publish.

What it does: Generated by changesets/action on main. Bumps package.json version 0.34.0 → 0.35.0, appends a '## 0.35.0' section to CHANGELOG.md with one Minor entry (PR #122, credential-aware default provider) and one Patch entry (PR #120, redesign-generation token headroom for reasoning models + distinct empty/truncated/non-JSON diagnostics), and deletes the two consumed .changeset files (credential-aware-defa
Goals it achieves: Cut a release containing the two merged changes since 0.34.0. The minor bump reflects the credential-aware default-provider behavior change; the patch reflects the design-audit reasoning-model fix. Goal is to ship both to npm consumers.
Assessment: Correctly formed Changesets release PR. Semver is right: the provider-default change is additive/backward-compatible (OpenAI unchanged when OPENAI_API_KEY is set; only the bare no-key run stops hard-failing), so minor — not major — is appropriate. Patch entry is correctly classified. CHANGELOG reproduces changeset summaries verbatim with PR/commit attribution, consistent with the repo's documented
Better / existing approach: none — this is the right approach. The repo explicitly standardized on Changesets + OIDC (CLAUDE.md 'Releases'); this is the version-bump step of that exact flow. No existing in-repo alternative to reuse — version bumps and CHANGELOG generation are the tool's job.
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 2
Bridge warning: opencode/kimi-for-coding/k2p7: opencode: opencode error

🎯 Usefulness — sound

Release bundles two in-grain, well-wired fixes: a credential-aware default provider that removes the last hardcoded openai on the no-flag path, and a design-audit token-budget/headroom fix that unblocks reasoning models — both reachable and correctly integrated.

Integration: Both changes land on live, central paths. resolveDefaultProvider() (provider-defaults.ts:111) is consumed by loadConfig (config.ts:249,256 — the single config entry), plus the run/test-runner/design-audit CLI commands (run.ts:192,332,340; test-runner.ts:1150; cli-design-audit.ts:291). The env-load ordering the design depends on is real: cli.ts:25 calls loadLocalEnvFiles at the top of `main()
Fit with existing patterns: Fits the codebase's grain precisely. The engine is already multi-provider (8-entry SupportedProvider union in provider-defaults.ts:1); #122 removes the LAST openai assumption on the bare-run path, exactly as framed. The design-audit fix follows the existing BrainGeneratorOptions.maxOutputTokens override seam and the fail-closed parser discipline — coerceNumber/coerceNumberArray preserve th
Real-world viability: Holds up past the happy path. resolveDefaultProvider is an idempotent env read with no race (env is stable post-startup); the no-key fall-through to claude-code is the designed keyless path, and resolveProviderApiKey still pulls ANTHROPIC_API_KEY for it when present. The generator uses Promise.allSettled so a single truncated/errored slot is dropped, never fatal, and the new `extractJson
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 1

No concerns — sound change, no better or existing approach found. ✅

What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass	What it asks
Heuristic	Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication	Do added function/class names already exist elsewhere in the repo?
Value Audit	What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit	Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

_{value-audit · 20260623T114548Z}

tangletools · 2026-06-23T11:48:47Z

✅ No Blockers — `1f81234d`

Readiness 95/100 · Confidence 70/100 · 0 findings (none)

	glm	deepseek	aggregate
Readiness	95	95	95
Confidence	70	70	70
Correctness	95	95	95
Security	95	95	95
Testing	95	95	95
Architecture	95	95	95

Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.

No findings.

_{tangletools · 2026-06-23T11:48:44Z · trace}

tangletools

✅ Auto-approved PR — `6918e89a`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-23T17:47:43Z}

tangletools · 2026-06-23T18:29:47Z

✅ No Blockers — `6918e89a`

Readiness 95/100 · Confidence 70/100 · 0 findings (none)

	glm	deepseek	aggregate
Readiness	95	95	95
Confidence	70	70	70
Correctness	95	95	95
Security	95	95	95
Testing	95	95	95
Architecture	95	95	95

Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.

No findings.

_{tangletools · 2026-06-23T18:29:44Z · trace}

tangletools

🟡 Value Audit — sound-with-nits


Verdict	sound-with-nits
Concerns	1 (1 weak-concern)
Heuristic	0.0s
Duplication	0.0s
Interrogation	104.7s (2 bridge agents)
Total	104.7s

💰 Value — sound-with-nits

Release PR bundles a content-fidelity fix that stops the reference-grounded redesign engine from inventing page content the app doesn't have — coherent, defense-in-depth, and squarely in this codebase's anti-fabrication grain.

What it does: Adds 'do not fabricate content' rules to three prompt surfaces in the design-audit redesign pipeline: (1) the generator system prompt (generate/prompt.ts:54-65) forbids inventing metrics/feeds/dates and tells it to keep sparse pages restrained; (2) the pairwise judge system prompt (judge/prompt.ts:46-50,86) penalizes invented content as unfaithful instead of rewarding it as 'richer'; (3) the codin
Goals it achieves: Stop a real observed failure: a content-sparse page grounded against a dense exemplar caused the generator to fabricate factual content (e.g. a placeholder page gaining a fake 'Recent Activity' feed with timestamps), AND the pairwise ranker rewarded that invented density as 'richer' — so applying the audit to a real app could inject fabricated data into the UI. The fix makes the redesign restyle/r
Assessment: Good change. Three things make it sound: (a) It's defense-in-depth — fixing only the generator would leave the judge still rewarding fabrication, and fixing only those two would leave the coding-agent apply step free to re-invent content at implementation time; covering generate→judge→apply closes the loop. (b) It's in the grain: 'never fabricate' is a first-class invariant of this engine (~40 occ
Better / existing approach: No materially better approach for the immediate goal. I searched for an existing content-inventory/real-content mechanism (contentSnapshot|pageContent|realContent|contentInventory|textContent under src/design) — none exists; the only textContent uses are in tokens/extract.ts and measure/contrast.ts for fingerprinting/contrast, not content fidelity. So nothing to reuse or extend. A stronger long-te
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 2
Bridge warning: opencode/kimi-for-coding/k2p7: opencode: opencode error

🎯 Usefulness — sound

Content-fidelity guardrails added at all three prompt layers (generator, judge, apply) of the reference-grounded redesign pipeline, fully reachable through production call paths and matching the codebase's existing prompt-constraint pattern.

Integration: All three modified builders are hot on production paths. buildApplyPrompt is written to .apply-prompt.md for every audited page at src/cli-design-audit.ts:455 and feeds runAgentEvolveLoop via evolve/index.ts:11. buildDirectionPrompt runs once per direction in src/design/audit/reference/generate/generator.ts:78. buildPairwisePrompt runs in both judges at src/design/audit/reference/judge/t
Fit with existing patterns: Perfectly in-grain. Every other guardrail in this pipeline (ANTI_POSITION_BIAS at judge/prompt.ts:38, RESPONSE_CONTRACT, the existing 'NEVER invent an exemplar id' rule at generate/prompt.ts:51) is also a prompt-string constraint, not a code-enforced invariant. The new CONTENT_FIDELITY constant and the added generator/apply bullets follow the identical pattern — no competing mechanism, no duplicat
Real-world viability: Low risk. The change is additive text inside prompt strings plus regression tests that assert the substrings survive (design-audit-evolve-agent.test.ts:42-47, design-audit-reference-generate.test.ts:193-205, design-audit-reference-judge.test.ts:121-128). There are no new code paths, no concurrency surface, no error-handling changes — the only 'input' is the prompt template itself, which is static.
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 1

💰 Value Audit

🟡 Forbidden-content example lists are restated in parallel across 3 prompts and can drift [maintenance] ``

The enumerated 'no fabricated metrics, counts, dates, statuses, activity feeds' list appears independently in generate/prompt.ts:55-57, judge/prompt.ts:48-49, and evolve/agent.ts:105 with slightly different wording and item sets. If the canonical set of fabricated-content shapes grows (e.g. 'fake testimonials', 'invented nav items'), three sites must be hand-updated in lockstep or the judge/generator/apply guardrails silently diverge. Role-tailored phrasing justifies not collapsing to one shared

What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass	What it asks
Heuristic	Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication	Do added function/class names already exist elsewhere in the repo?
Value Audit	What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit	Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

_{value-audit · 20260623T183020Z}

tangletools

✅ Auto-approved PR — `b79ee1c9`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-23T23:25:56Z}

tangletools · 2026-06-24T01:46:28Z

✅ No Blockers — `b79ee1c9`

Readiness 95/100 · Confidence 70/100 · 0 findings (none)

	opencode-kimi	glm	deepseek	aggregate
Readiness	95	95	95	95
Confidence	70	70	70	70
Correctness	95	95	95	95
Security	95	95	95	95
Testing	95	95	95	95
Architecture	95	95	95	95

Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.

No findings.

_{tangletools · 2026-06-24T01:46:25Z · trace}

tangletools

🟢 Value Audit — sound


Verdict	sound
Concerns	1 (1 low)
Heuristic	0.0s
Duplication	0.0s
Interrogation	193.0s (2 bridge agents)
Total	193.0s

💰 Value — sound

Reframes reference-grounded design audit from aesthetic copycat to task-first product design, with content-fidelity guardrails and data-driven functional preservation — a coherent, worthwhile change.

What it does: Changes the reference-grounded redesign engine's prompts from an 'art director' persona to a 'senior product designer' persona. The generator now has hard, priority-ordered rules: task fitness first, preserve navigation/wayfinding, preserve information density on dense pages, never turn one page type into another, use only the page's real content, and never fabricate metrics/feeds/sections. A per-
Goals it achieves: Fix the observed failure modes where reference-grounded redesigns turned functional pages (docs, dashboards, aggregators) into sparse marketing brochures by copying the reference's structure, and where sparse pages grounded against dense exemplars were padded with fabricated data/metrics/activity feeds. It makes the redesign serve the page's actual users and job rather than visual mimicry.
Assessment: Good change, built in the grain of the codebase. It aligns the reference-grounded engine with the task-first, product-designer framing already used by the v1 classifier (src/design/audit/classify.ts:42) and evaluator (src/design/audit/evaluate.ts:415). It reuses existing DesignDNA fields (src/design/audit/reference/contracts.ts:263-289) for the functional contract instead of inventing new measurem
Better / existing approach: none — this is the right approach. I searched the reference engine (src/design/audit/reference/generate, src/design/audit/reference/judge, src/design/audit/reference/artifact, src/design/audit/reference/dna, src/design/audit/reference/generate/parse.ts), the v1 audit path (src/design/audit/evaluate.ts, src/design/audit/classify.ts), and the rubric fragments (src/design/audit/rubric/fragments). No
Model: opencode/kimi-for-coding/k2p7
Bridge attempts: 1

🎯 Usefulness — sound

A coherent job-first reframe of the design-audit prompt layer — reachable from all production callers, data-driven off measured DNA, and robust to sparse/edge pages; no dead surface and no competing pattern.

Integration: All three changed prompt builders have live production callers in this PR's codebase: buildDirectionPrompt is called by generate/generator.ts:78 (the per-exemplar fan-out); buildPairwisePrompt/buildQualityPrompt by judge/text-judge.ts:46-47 and judge/vision-judge.ts:162-163; buildApplyPrompt is re-exported from evolve/index.ts:11 and called by cli-design-audit.ts:455. Nothing is orphaned.
Fit with existing patterns: Fits the established grain. The new renderFunctionalContract (generate/prompt.ts:126) is a structural twin of the pre-existing renderConstraints (generate/prompt.ts:149) — both read ctx fields, gate emission on presence, and emit a labeled block. The judge changes keep the same anti-position-bias/RESPONSE_CONTRACT skeleton and only swap the persona/priority ordering. No competing or duplicated cap
Real-world viability: Holds up off the happy path. The contract's three inputs are genuinely measured, not hardcoded: components.nav = distinctNavCount (dna/derive.ts:304), layout.density = deriveDensity (derive.ts:308,344), layout.archetype = deriveArchetype (derive.ts:345), and Density is the lowercase 'sparse'|'balanced'|'dense' union (contracts.ts:85) so the '==='dense' gate matches real output. Gating is defensive
Model: opencode/zai-coding-plan/glm-5.2
Bridge attempts: 1

🔎 Heuristic Signals

🟡 Cruft: todo added src/design/audit/reference/generate/prompt.ts

' facts or use placeholders like "TODO" or "lorem ipsum".',

What this audit checks

It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.

Pass	What it asks
Heuristic	Vague title? Whitespace-only or cruft-bearing diff? (content signals only)
Duplication	Do added function/class names already exist elsewhere in the repo?
Value Audit	What does it do? What goal does it achieve? Is it good? Better architecture or already-exists?
Usefulness Audit	Does it integrate and fit? Will it hold up in real use and actually get used?

Findings are concerns, not blocks — the human reviewer decides what to do with them.

_{value-audit · 20260624T014759Z}

tangletools previously approved these changes Jun 23, 2026

View reviewed changes

github-actions Bot dismissed tangletools’s stale review via 1f81234 June 23, 2026 10:44

github-actions Bot force-pushed the changeset-release/main branch from 70dd270 to 1f81234 Compare June 23, 2026 10:44

tangletools previously approved these changes Jun 23, 2026

View reviewed changes

tangletools reviewed Jun 23, 2026

View reviewed changes

github-actions Bot dismissed tangletools’s stale review via 6918e89 June 23, 2026 17:47

github-actions Bot force-pushed the changeset-release/main branch from 1f81234 to 6918e89 Compare June 23, 2026 17:47

tangletools previously approved these changes Jun 23, 2026

View reviewed changes

tangletools reviewed Jun 23, 2026

View reviewed changes

chore: version packages

b79ee1c

github-actions Bot dismissed tangletools’s stale review via b79ee1c June 23, 2026 23:25

github-actions Bot force-pushed the changeset-release/main branch from 6918e89 to b79ee1c Compare June 23, 2026 23:25

tangletools approved these changes Jun 23, 2026

View reviewed changes

tangletools reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release: version packages#121

Release: version packages#121
github-actions[bot] wants to merge 1 commit into
mainfrom
changeset-release/main

github-actions Bot commented Jun 23, 2026 •

edited

Loading

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools commented Jun 23, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools commented Jun 23, 2026

Uh oh!

tangletools left a comment

Uh oh!

tangletools left a comment

Uh oh!

tangletools commented Jun 24, 2026

Uh oh!

tangletools left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

github-actions Bot commented Jun 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Releases

@tangle-network/browser-agent-driver@0.35.0

Minor Changes

Patch Changes

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 70dd270f

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 1f81234d

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

🟢 Value Audit — sound

💰 Value — sound

🎯 Usefulness — sound

Uh oh!

tangletools commented Jun 23, 2026

✅ No Blockers — 1f81234d

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 6918e89a

Uh oh!

tangletools commented Jun 23, 2026

✅ No Blockers — 6918e89a

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

🟡 Value Audit — sound-with-nits

💰 Value — sound-with-nits

🎯 Usefulness — sound

💰 Value Audit

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — b79ee1c9

Uh oh!

tangletools commented Jun 24, 2026

✅ No Blockers — b79ee1c9

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

🟢 Value Audit — sound

💰 Value — sound

🎯 Usefulness — sound

🔎 Heuristic Signals

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented Jun 23, 2026 •

edited

Loading

✅ Auto-approved PR — `70dd270f`

✅ Auto-approved PR — `1f81234d`

✅ No Blockers — `1f81234d`

✅ Auto-approved PR — `6918e89a`

✅ No Blockers — `6918e89a`

✅ Auto-approved PR — `b79ee1c9`

✅ No Blockers — `b79ee1c9`