fix(autodata): fail-loud on empty solver content + tier knobs + the honest live result by drewstone · Pull Request #42 · tangle-network/agent-knowledge

drewstone · 2026-06-25T22:29:52Z

Autopsy of the Autodata live null: the strong reasoning-model solver returned EMPTY content at maxTokens=1024 (reasoning ate the budget) and was scored 0 → a FALSE negative strong/weak gap across every run. Fix: maxTokens=8000 + fail-loud on empty (no silent zeros), solver tier as env knobs, price table for the wide tier. With the fix the gap is ~0 (not negative): on extractive doc-grounded QA an 8B (llama-3.1-8b) scores as well as a frontier model (gemini-2.5-pro) — so 0 examples discriminate, and the real lever is CHALLENGER difficulty (non-extractive questions), not model tier. Full autopsy + numbers in docs/results/autodata-live.md.

…Tokens + tier env knobs The strong reasoning-model solver returned empty visible content at maxTokens=1024 (budget spent on hidden reasoning) and was silently scored 0 — manufacturing a false negative strong/weak gap across every live run. Fix: maxTokens=8000 + throw on empty content (no silent zeros). Make the solver tier an env knob (AUTODATA_WEAK_MODEL/ STRONG_MODEL/CHALLENGER_MODEL/JUDGE_MODEL) + price the wide tier. With the fix the gap is ~0 (not negative): on extractive doc-grounded QA an 8B scores as well as a frontier model — the lever is challenger difficulty, not model tier. See docs/results/autodata-live.md.

tangletools

✅ Auto-approved PR — `2953db28`

Blanket team auto-approval is enabled for this reviewer service.
The full PR reviewer audit still runs separately and will publish findings if it detects issues.

_{tangletools · auto-approval · reason: blanket_auto_approve · 2026-06-25T22:29:59Z}

tangletools approved these changes Jun 25, 2026

View reviewed changes

drewstone merged commit 2db1feb into main Jun 25, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(autodata): fail-loud on empty solver content + tier knobs + the honest live result#42

fix(autodata): fail-loud on empty solver content + tier knobs + the honest live result#42
drewstone merged 1 commit into
mainfrom
autodata/wide-tier

drewstone commented Jun 25, 2026

Uh oh!

tangletools left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

drewstone commented Jun 25, 2026

Uh oh!

tangletools left a comment

Choose a reason for hiding this comment

✅ Auto-approved PR — 2953db28

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

✅ Auto-approved PR — `2953db28`