tangle-network · drewstone · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026
diff --git a/docs/eval/investment-material-facts.md b/docs/eval/investment-material-facts.md
diff --git a/docs/results/investment-thesis.md b/docs/results/investment-thesis.md
diff --git a/docs/two-agent-research-ab.md b/docs/two-agent-research-ab.md
@@ -449,6 +449,32 @@ both earn a narrow, cost-stratified one — the verifier on misattribution and t
 off-scope tail (§5), the driver only where a richer worker makes "go corroborate this"
 reach a page collection can't.
 
+### 9.1 The domain was too easy — re-running the A/B where one search is not enough
+
+The §9 null has a structural cause, not a measurement one: **on an ML topic a single
+good search already collects the answer.** Every arm finished in one effective round
+because the first fetch met the readiness gate, so the driving driver — whose mechanism
+is steering a *second* round — never acted. When one search suffices, there is no
+investigation for a smarter coordinator to do, and the metric can only reward
+collection. To ask whether topology *can ever* beat blind collection, you have to move
+to a domain where the answer is buried and a single fetch provably cannot surface it.
+
+So we did. We ported the whole apparatus — firewalled checklist, `$0` model-free
+grader, matched-compute 3-arm A/B — onto **investment research**: give a loop a company
++ ticker + an as-of cutoff and grade the thesis on the buried, material 10-K-footnote
+facts a ticker search misses (an HTM mark the size of a bank's equity, a 97%-uninsured
+deposit base, a negative per-unit margin). First we *calibrated* the new metric and
+proved it discriminates depth on this harder domain — a shallow ticker summary scores
+**1/27 (4%)**, a deep filings-grounded thesis **27/27 (100%)**, a **+96-point** gap.
+Then the live 3-arm A/B over 5 held-out companies: **driving surfaced the most buried
+facts (16/27, 59%) vs blind collection (11/27, 41%)** for ~1.9× the cost — the lift is
+real and points the right way, but at n=5 the paired-bootstrap CI still crosses zero
+(P(Δ≤0)=0.08), and verify did not beat collection (10/27). So the verdict survives the
+domain change: **no topology *significantly* beats collection — but on a domain where
+the answer must be investigated for, driving is the only arm that even leans positive,
+and it does so suggestively, not significantly.** Full reframe, calibration, and
+per-company A/B: [`docs/results/investment-thesis.md`](results/investment-thesis.md).
+
 ## 10. Reproduce
 
 The loop, the worker, the verifier, the claim-grounding mode, the adaptive driver, the
@@ -509,6 +535,7 @@ the A/B harnesses — [`tests/loops/`](../tests/loops/).
 Per-result detail: [`docs/results/cost-quality.md`](results/cost-quality.md),
 [`docs/results/claim-grounding.md`](results/claim-grounding.md),
 [`docs/results/adaptive.md`](results/adaptive.md),
-[`docs/results/research-driving.md`](results/research-driving.md).
+[`docs/results/research-driving.md`](results/research-driving.md),
+[`docs/results/investment-thesis.md`](results/investment-thesis.md) (§9.1 — the domain reframe + calibration + 3-arm A/B).
 </content>
 </invoke>
diff --git a/src/collection-research-driver.ts b/src/collection-research-driver.ts
@@ -0,0 +1,46 @@
+/**
+ * The SINGLE-AGENT COLLECTION driver — the blind-collection baseline (Arm A).
+ *
+ * This is the honest null the depth A/B is measured against. The other drivers
+ * spend extra inference to do something differentiated:
+ *   - `createVerifyingResearchDriver` runs an LLM gate per source (Arm B),
+ *   - `createResearchDrivingDriver` extracts claims, tracks corroboration, and
+ *     synthesizes deep follow-up questions to drive depth (Arm C).
+ *
+ * This driver does NONE of that. It is a pass-through: it accepts every source
+ * the worker proposes and contributes no research, no gating, and no steering of
+ * its own. The loop still dedups exact-uri duplicates before calling
+ * `verifySource` (that is the loop's job, not the driver's), and the default
+ * `foldGaps` (a plain bulleted list of the still-open readiness gaps) still folds
+ * the gaps into the worker's next prompt — so the worker keeps researching, but
+ * NOTHING intelligent sits between the worker and the knowledge base.
+ *
+ * In other words: ONE agent (the worker) collects sources round after round, and
+ * the "driver" is an inert rubber stamp. That is exactly what "single-agent
+ * collection" means — the topology with zero coordinator intelligence — so its
+ * material-facts score is the floor every other arm must beat to justify its
+ * extra inference cost.
+ *
+ * It adds NO router calls of its own: `verifySource` is a synchronous accept and
+ * `foldGaps` is omitted so the loop uses its built-in gap list. So Arm A's cost
+ * is the worker's cost alone — the cleanest possible blind-collection baseline.
+ */
+
+import type {
+  ResearchDriver,
+  ResearchSourceProposal,
+  SourceVerdict,
+} from './two-agent-research-loop'
+
+/**
+ * Build the single-agent collection driver. Accepts every source; never gates,
+ * never researches, never steers beyond the loop's default open-gap list. The
+ * worker is the only agent that thinks.
+ */
+export function createCollectionResearchDriver(): ResearchDriver {
+  return {
+    verifySource(_source: ResearchSourceProposal): SourceVerdict {
+      return { accept: true }
+    },
+  }
+}
diff --git a/src/index.ts b/src/index.ts
@@ -3,6 +3,7 @@ export * from './adaptive-driver'
 export * from './changes'
 export * from './chunking'
 export * from './claim-grounding'
+export * from './collection-research-driver'
 export * from './discovery'
 export * from './eval-readiness'
 export * from './events'
@@ -12,8 +13,11 @@ export * from './graph'
 export * from './ids'
 export * from './indexer'
 export * from './inspect'
+export * from './investment-thesis-set'
+export * from './investment-thesis-task'
 export * from './kb-store'
 export * from './lint'
+export * from './material-facts-metric'
 export * from './memory/index'
 export * from './proposals'
 export * from './propose-from-finding'