ebarti · ebarti · Jun 23, 2026 · Jun 22, 2026 · Jun 22, 2026 · Jun 22, 2026
diff --git a/docs/sdk-evolution-agent-design.md b/docs/sdk-evolution-agent-design.md
diff --git a/docs/sdk-evolution-agent.md b/docs/sdk-evolution-agent.md
@@ -4,6 +4,10 @@ The SDK evolution agent is a local dogfood workflow for keeping
 agent-runtime-kit aligned with Claude Agent SDK, OpenAI Codex SDK, and Google
 Antigravity SDK as those upstream packages evolve.
 
+For the intended architecture, evidence contract, behavior probe strategy,
+changelog strategy, caveats, and alternatives, see
+[`docs/sdk-evolution-agent-design.md`](sdk-evolution-agent-design.md).
+
 Run it from the repository:
 
 ```bash
@@ -32,6 +36,13 @@ directory is created with private permissions before the Codex runtime starts;
 authenticate that Codex home through supported Codex login/API-key/access-token
 flows before using it for real Codex-backed runs.
 
+Codex-backed SDK evolution runs explicitly choose `gpt-5.5` with
+`reasoning_effort=xhigh` for the AI stages that analyze direction, decide the
+update plan, implement allowed changes, and review the result. This model policy
+is applied only to `codex-agent-sdk`; Claude and Antigravity runs keep their
+provider-native model selection because `gpt-5.5` is not a valid model override
+for those adapters.
+
 For Antigravity, local auth can use `GEMINI_API_KEY` / `GOOGLE_API_KEY` or
 Google Application Default Credentials. ADC runs use Vertex AI config; provide a
 project through ADC, `GOOGLE_CLOUD_PROJECT`, or `GCLOUD_PROJECT`, and optionally
@@ -46,8 +57,12 @@ Each run writes a timestamped directory under `reports/sdk-evolution/` with:
 
 - `config.json`
 - `evidence.json`
+- `release_notes.json`
 - `api_snapshots/`
 - `api_diffs.json`
+- `behavior_probes.json`
+- `behavior_diffs.json`
+- `current_state.json`
 - `direction_analysis.json`
 - `architecture_decision.json`
 - `implementation_summary.json`
@@ -56,7 +71,8 @@ Each run writes a timestamped directory under `reports/sdk-evolution/` with:
 - `report.md`
 
 The report separates deterministic facts from runtime-generated analysis and
-calls out uncertainty, recursive self-adaptation impact, implementation status,
+calls out uncertainty, release-note coverage, API diffs, behavior diffs,
+baseline promotion, recursive self-adaptation impact, implementation status,
 test results, reviewer output, and manual review items.
 
 ## Upstream Freshness
@@ -76,10 +92,28 @@ cutoff variables must not hide candidate releases.
 
 ## Candidate API Inspection
 
-By default, the command snapshots SDK APIs importable in the current
-environment. Use `--inspect-candidates` to install latest candidate SDK versions
-in temporary isolated virtualenvs for API snapshots and diffs. This avoids
-mutating the project lockfile or working environment.
+The command treats `uv.lock` as the current baseline. If the active `.venv`
+contains a different installed version, the agent inspects the locked baseline
+in a temporary isolated virtualenv instead of trusting the drifted environment.
+When a refresh preview is available, package update candidates come from the
+resolver's `uv lock --dry-run -P ...` output, not only from PyPI's `latest`
+metadata. For each resolver update candidate, the agent installs the target
+version in a temporary isolated virtualenv and writes an API snapshot plus
+`api_diffs.json` entry. This avoids false downgrade diffs for packages whose
+locked prerelease is newer than PyPI's stable latest field. Candidate inspection
+is always enabled for update candidates; `--inspect-candidates` remains accepted
+only for CLI compatibility.
+
+If `uv lock --dry-run -P ...` reports an SDK update but the run cannot produce a
+candidate-version API diff for that package, implementation is blocked and the
+architecture decision is marked `manual_design_required`. An empty added /
+removed / changed diff is valid; a missing diff object is not.
+
+Behavior probes intentionally separate observed SDK surface churn from adapter
+contract breakage. `behavior_probes.json` records fields and parameters seen in
+current and candidate packages, while `behavior_diffs.json` compares the
+required adapter contract. Optional field changes remain visible in the report
+and API diffs, but only breaking adapter-contract diffs block implementation.
 
 ## Implementation Gates
 
@@ -93,6 +127,9 @@ Implementation is still blocked when:
 
 - the architecture decision sets `manual_design_required`,
 - the reviewer rejects the evidence or design,
+- a resolver-selected update lacks a candidate API diff,
+- required release-note evidence could not be collected,
+- candidate behavior probes show a breaking adapter-contract difference,
 - required structured output or permission behavior is unsupported by the
   selected runtime,
 - recursive self-adaptation is required but no safe migration plan exists.
@@ -114,8 +151,13 @@ python -m examples.sdk_evolution_agent \
   --implementation-enabled \
   --create-branch \
   --branch-name sdk-evolution-update \
+  --pr-base main \
   --draft-pr
 ```
 
+When `--draft-pr` is set, the agent stages `uv.lock` and the run report
+directory, commits them with `--commit-message`, pushes the branch, and opens a
+draft PR with `gh`. It never auto-merges.
+
 The command uses local Git and `gh` authentication. It never auto-merges,
 auto-publishes, or scrapes unsupported credentials.