Release: version packages#121
Conversation
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 70dd270f
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-23T09:34:16Z
70dd270 to
1f81234
Compare
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 1f81234d
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-23T10:44:42Z
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 0 (none) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 241.9s (2 bridge agents) |
| Total | 241.9s |
💰 Value — sound
Standard Changesets release PR: bumps 0.34.0 → 0.35.0 with a correct minor (credential-aware provider, #122) + patch (reasoning-token headroom, #120); merges to trigger OIDC npm publish.
- What it does: Generated by changesets/action on main. Bumps package.json version 0.34.0 → 0.35.0, appends a '## 0.35.0' section to CHANGELOG.md with one Minor entry (PR #122, credential-aware default provider) and one Patch entry (PR #120, redesign-generation token headroom for reasoning models + distinct empty/truncated/non-JSON diagnostics), and deletes the two consumed .changeset files (credential-aware-defa
- Goals it achieves: Cut a release containing the two merged changes since 0.34.0. The minor bump reflects the credential-aware default-provider behavior change; the patch reflects the design-audit reasoning-model fix. Goal is to ship both to npm consumers.
- Assessment: Correctly formed Changesets release PR. Semver is right: the provider-default change is additive/backward-compatible (OpenAI unchanged when OPENAI_API_KEY is set; only the bare no-key run stops hard-failing), so minor — not major — is appropriate. Patch entry is correctly classified. CHANGELOG reproduces changeset summaries verbatim with PR/commit attribution, consistent with the repo's documented
- Better / existing approach: none — this is the right approach. The repo explicitly standardized on Changesets + OIDC (CLAUDE.md 'Releases'); this is the version-bump step of that exact flow. No existing in-repo alternative to reuse — version bumps and CHANGELOG generation are the tool's job.
- Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 2
- Bridge warning: opencode/kimi-for-coding/k2p7: opencode: opencode error
🎯 Usefulness — sound
Release bundles two in-grain, well-wired fixes: a credential-aware default provider that removes the last hardcoded openai on the no-flag path, and a design-audit token-budget/headroom fix that unblocks reasoning models — both reachable and correctly integrated.
- Integration: Both changes land on live, central paths.
resolveDefaultProvider()(provider-defaults.ts:111) is consumed byloadConfig(config.ts:249,256 — the single config entry), plus the run/test-runner/design-audit CLI commands (run.ts:192,332,340; test-runner.ts:1150; cli-design-audit.ts:291). The env-load ordering the design depends on is real: cli.ts:25 callsloadLocalEnvFilesat the top of `main() - Fit with existing patterns: Fits the codebase's grain precisely. The engine is already multi-provider (8-entry
SupportedProviderunion in provider-defaults.ts:1); #122 removes the LAST openai assumption on the bare-run path, exactly as framed. The design-audit fix follows the existingBrainGeneratorOptions.maxOutputTokensoverride seam and the fail-closed parser discipline —coerceNumber/coerceNumberArraypreserve th - Real-world viability: Holds up past the happy path.
resolveDefaultProvideris an idempotent env read with no race (env is stable post-startup); the no-key fall-through toclaude-codeis the designed keyless path, andresolveProviderApiKeystill pullsANTHROPIC_API_KEYfor it when present. The generator usesPromise.allSettledso a single truncated/errored slot is dropped, never fatal, and the new `extractJson - Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 1
No concerns — sound change, no better or existing approach found. ✅
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
✅ No Blockers —
|
| glm | deepseek | aggregate | |
|---|---|---|---|
| Readiness | 95 | 95 | 95 |
| Confidence | 70 | 70 | 70 |
| Correctness | 95 | 95 | 95 |
| Security | 95 | 95 | 95 |
| Testing | 95 | 95 | 95 |
| Architecture | 95 | 95 | 95 |
Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.
No findings.
tangletools · 2026-06-23T11:48:44Z · trace
1f81234 to
6918e89
Compare
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — 6918e89a
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-23T17:47:43Z
✅ No Blockers —
|
| glm | deepseek | aggregate | |
|---|---|---|---|
| Readiness | 95 | 95 | 95 |
| Confidence | 70 | 70 | 70 |
| Correctness | 95 | 95 | 95 |
| Security | 95 | 95 | 95 |
| Testing | 95 | 95 | 95 |
| Architecture | 95 | 95 | 95 |
Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.
No findings.
tangletools · 2026-06-23T18:29:44Z · trace
tangletools
left a comment
There was a problem hiding this comment.
🟡 Value Audit — sound-with-nits
| Verdict | sound-with-nits |
| Concerns | 1 (1 weak-concern) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 104.7s (2 bridge agents) |
| Total | 104.7s |
💰 Value — sound-with-nits
Release PR bundles a content-fidelity fix that stops the reference-grounded redesign engine from inventing page content the app doesn't have — coherent, defense-in-depth, and squarely in this codebase's anti-fabrication grain.
- What it does: Adds 'do not fabricate content' rules to three prompt surfaces in the design-audit redesign pipeline: (1) the generator system prompt (generate/prompt.ts:54-65) forbids inventing metrics/feeds/dates and tells it to keep sparse pages restrained; (2) the pairwise judge system prompt (judge/prompt.ts:46-50,86) penalizes invented content as unfaithful instead of rewarding it as 'richer'; (3) the codin
- Goals it achieves: Stop a real observed failure: a content-sparse page grounded against a dense exemplar caused the generator to fabricate factual content (e.g. a placeholder page gaining a fake 'Recent Activity' feed with timestamps), AND the pairwise ranker rewarded that invented density as 'richer' — so applying the audit to a real app could inject fabricated data into the UI. The fix makes the redesign restyle/r
- Assessment: Good change. Three things make it sound: (a) It's defense-in-depth — fixing only the generator would leave the judge still rewarding fabrication, and fixing only those two would leave the coding-agent apply step free to re-invent content at implementation time; covering generate→judge→apply closes the loop. (b) It's in the grain: 'never fabricate' is a first-class invariant of this engine (~40 occ
- Better / existing approach: No materially better approach for the immediate goal. I searched for an existing content-inventory/real-content mechanism (contentSnapshot|pageContent|realContent|contentInventory|textContent under src/design) — none exists; the only textContent uses are in tokens/extract.ts and measure/contrast.ts for fingerprinting/contrast, not content fidelity. So nothing to reuse or extend. A stronger long-te
- Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 2
- Bridge warning: opencode/kimi-for-coding/k2p7: opencode: opencode error
🎯 Usefulness — sound
Content-fidelity guardrails added at all three prompt layers (generator, judge, apply) of the reference-grounded redesign pipeline, fully reachable through production call paths and matching the codebase's existing prompt-constraint pattern.
- Integration: All three modified builders are hot on production paths.
buildApplyPromptis written to.apply-prompt.mdfor every audited page at src/cli-design-audit.ts:455 and feedsrunAgentEvolveLoopvia evolve/index.ts:11.buildDirectionPromptruns once per direction in src/design/audit/reference/generate/generator.ts:78.buildPairwisePromptruns in both judges at src/design/audit/reference/judge/t - Fit with existing patterns: Perfectly in-grain. Every other guardrail in this pipeline (ANTI_POSITION_BIAS at judge/prompt.ts:38, RESPONSE_CONTRACT, the existing 'NEVER invent an exemplar id' rule at generate/prompt.ts:51) is also a prompt-string constraint, not a code-enforced invariant. The new CONTENT_FIDELITY constant and the added generator/apply bullets follow the identical pattern — no competing mechanism, no duplicat
- Real-world viability: Low risk. The change is additive text inside prompt strings plus regression tests that assert the substrings survive (design-audit-evolve-agent.test.ts:42-47, design-audit-reference-generate.test.ts:193-205, design-audit-reference-judge.test.ts:121-128). There are no new code paths, no concurrency surface, no error-handling changes — the only 'input' is the prompt template itself, which is static.
- Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 1
💰 Value Audit
🟡 Forbidden-content example lists are restated in parallel across 3 prompts and can drift [maintenance] ``
The enumerated 'no fabricated metrics, counts, dates, statuses, activity feeds' list appears independently in generate/prompt.ts:55-57, judge/prompt.ts:48-49, and evolve/agent.ts:105 with slightly different wording and item sets. If the canonical set of fabricated-content shapes grows (e.g. 'fake testimonials', 'invented nav items'), three sites must be hand-updated in lockstep or the judge/generator/apply guardrails silently diverge. Role-tailored phrasing justifies not collapsing to one shared
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
6918e89 to
b79ee1c
Compare
tangletools
left a comment
There was a problem hiding this comment.
✅ Auto-approved PR — b79ee1c9
Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.
tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-23T23:25:56Z
✅ No Blockers —
|
| opencode-kimi | glm | deepseek | aggregate | |
|---|---|---|---|---|
| Readiness | 95 | 95 | 95 | 95 |
| Confidence | 70 | 70 | 70 | 70 |
| Correctness | 95 | 95 | 95 | 95 |
| Security | 95 | 95 | 95 | 95 |
| Testing | 95 | 95 | 95 | 95 |
| Architecture | 95 | 95 | 95 | 95 |
Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision. | Full multi-shot audit completed 2/2 planned shots over 2 changed files. Global verifier still owns final merge decision.
No findings.
tangletools · 2026-06-24T01:46:25Z · trace
tangletools
left a comment
There was a problem hiding this comment.
🟢 Value Audit — sound
| Verdict | sound |
| Concerns | 1 (1 low) |
| Heuristic | 0.0s |
| Duplication | 0.0s |
| Interrogation | 193.0s (2 bridge agents) |
| Total | 193.0s |
💰 Value — sound
Reframes reference-grounded design audit from aesthetic copycat to task-first product design, with content-fidelity guardrails and data-driven functional preservation — a coherent, worthwhile change.
- What it does: Changes the reference-grounded redesign engine's prompts from an 'art director' persona to a 'senior product designer' persona. The generator now has hard, priority-ordered rules: task fitness first, preserve navigation/wayfinding, preserve information density on dense pages, never turn one page type into another, use only the page's real content, and never fabricate metrics/feeds/sections. A per-
- Goals it achieves: Fix the observed failure modes where reference-grounded redesigns turned functional pages (docs, dashboards, aggregators) into sparse marketing brochures by copying the reference's structure, and where sparse pages grounded against dense exemplars were padded with fabricated data/metrics/activity feeds. It makes the redesign serve the page's actual users and job rather than visual mimicry.
- Assessment: Good change, built in the grain of the codebase. It aligns the reference-grounded engine with the task-first, product-designer framing already used by the v1 classifier (src/design/audit/classify.ts:42) and evaluator (src/design/audit/evaluate.ts:415). It reuses existing DesignDNA fields (src/design/audit/reference/contracts.ts:263-289) for the functional contract instead of inventing new measurem
- Better / existing approach: none — this is the right approach. I searched the reference engine (src/design/audit/reference/generate, src/design/audit/reference/judge, src/design/audit/reference/artifact, src/design/audit/reference/dna, src/design/audit/reference/generate/parse.ts), the v1 audit path (src/design/audit/evaluate.ts, src/design/audit/classify.ts), and the rubric fragments (src/design/audit/rubric/fragments). No
- Model: opencode/kimi-for-coding/k2p7
- Bridge attempts: 1
🎯 Usefulness — sound
A coherent job-first reframe of the design-audit prompt layer — reachable from all production callers, data-driven off measured DNA, and robust to sparse/edge pages; no dead surface and no competing pattern.
- Integration: All three changed prompt builders have live production callers in this PR's codebase: buildDirectionPrompt is called by generate/generator.ts:78 (the per-exemplar fan-out); buildPairwisePrompt/buildQualityPrompt by judge/text-judge.ts:46-47 and judge/vision-judge.ts:162-163; buildApplyPrompt is re-exported from evolve/index.ts:11 and called by cli-design-audit.ts:455. Nothing is orphaned.
- Fit with existing patterns: Fits the established grain. The new renderFunctionalContract (generate/prompt.ts:126) is a structural twin of the pre-existing renderConstraints (generate/prompt.ts:149) — both read ctx fields, gate emission on presence, and emit a labeled block. The judge changes keep the same anti-position-bias/RESPONSE_CONTRACT skeleton and only swap the persona/priority ordering. No competing or duplicated cap
- Real-world viability: Holds up off the happy path. The contract's three inputs are genuinely measured, not hardcoded: components.nav = distinctNavCount (dna/derive.ts:304), layout.density = deriveDensity (derive.ts:308,344), layout.archetype = deriveArchetype (derive.ts:345), and Density is the lowercase 'sparse'|'balanced'|'dense' union (contracts.ts:85) so the '==='dense' gate matches real output. Gating is defensive
- Model: opencode/zai-coding-plan/glm-5.2
- Bridge attempts: 1
🔎 Heuristic Signals
🟡 Cruft: todo added src/design/audit/reference/generate/prompt.ts
- ' facts or use placeholders like "TODO" or "lorem ipsum".',
What this audit checks
It judges the change on its merits — not whether it was tasked out in an issue. Unticketed, fast-moving work is fine; the question is whether the change is good and whether a better or existing approach should be used instead.
| Pass | What it asks |
|---|---|
| Heuristic | Vague title? Whitespace-only or cruft-bearing diff? (content signals only) |
| Duplication | Do added function/class names already exist elsewhere in the repo? |
| Value Audit | What does it do? What goal does it achieve? Is it good? Better architecture or already-exists? |
| Usefulness Audit | Does it integrate and fit? Will it hold up in real use and actually get used? |
Findings are concerns, not blocks — the human reviewer decides what to do with them.
This PR was opened by the Changesets release GitHub action. When you're ready to do a release, you can merge this and the packages will be published to npm automatically. If you're not ready to do a release yet, that's fine, whenever you add more changesets to main, this PR will be updated.
Releases
@tangle-network/browser-agent-driver@0.35.0
Minor Changes
#122
b0f74a4Thanks @drewstone! - The default provider is now credential-aware instead of a hardopenai. A bare run (no--provider/--model, no config-file provider) uses OpenAI whenOPENAI_API_KEYis set — unchanged for existing users and CI — and otherwise falls back to an available provider (claude-code, which needs no key) rather than failing on a missing OpenAI key. An explicit provider in CLI flags or a config file is always honored, and the default model maps per-provider as before (e.g. gpt-5.4 → sonnet for claude-code). This removes the last place the no-flag path assumed OpenAI; the engine already supported openai/anthropic/google/claude-code/zai for both text and vision.#124
a2055b2Thanks @drewstone! - design-audit (reference-grounded): make the redesign engine job-first instead of aesthetic-first. The old engine grounded every page in a world-class exemplar's visual DNA and judged on visual craft, so it regressed functional pages into generic brochures — a docs page lost its table-of-contents and dense reference content for two marketing cards and a hero; an aggregator dropped from 30 items to 9; a status dashboard shed services into spacious cards. The fix:reference/generate/prompt.ts): persona reframed from art director to product designer. New hard rules in priority order — task-first (design for the page's users and the job in its intent) → preserve functional affordances (never delete navigation/ToC/search to look cleaner) → preserve density where it is the value (docs/dashboards/feeds keep their item count) → right-size the intervention (never turn one kind of page into another) → the exemplar is a source of visual craft only, never a structural template.reference/judge/prompt.ts): scores task fitness and functional preservation BEFORE visual craft; a polished direction that removes navigation or reduces density loses. "Fit to the reference" counts only as visual craft.Validated by re-running the regressed pages: docs now keeps its ToC + prev/next nav + dense code examples; HN keeps all 30 stories + nav; the status dashboard stays a dense service grid with real values. No provider coupling; flag-gated reference engine only.
Patch Changes
#123
20942c2Thanks @drewstone! - design-audit (reference-grounded): enforce content fidelity so a redesign never fabricates content the page lacks. On a content-sparse page grounded against a dense exemplar, the generator would invent factual content to fill the layout (e.g. a placeholder page gaining a fake "Recent Activity" feed with timestamps, invented status/RFC/registry data), and the pairwise direction-ranker rewarded that invented density as "richer" — so applied to a real app the audit could inject fabricated data into the UI. Now the generator may restyle/regroup/re-rank only the page's real content (the exemplar governs how it looks, never what content it has; a sparse page stays proportionally restrained), the ranker penalises invented content as unfaithful instead of rewarding it, and the apply prompt carries a defense-in-depth "do not invent content" guardrail. No provider coupling.#120
f11b899Thanks @drewstone! - design-audit (reference-grounded): make redesign generation work with reasoning models. The generator capped output at 2200 tokens, which a reasoning model (e.g. GLM-5.2, o-series) spends on its thinking before the answer — so the JSON direction came back empty or truncated and the audit fell back with a misleading "no JSON object found". Raise the per-direction budget to 8000 (non-reasoning models stop at the closing brace and never use the extra, so it's free for them), and report empty vs truncated vs non-JSON output distinctly so a budget/limit issue is diagnosable. No coupling to any one provider — the engine already runs on openai/anthropic/google/claude-code/zai.