Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
622 changes: 622 additions & 0 deletions docs/sdk-evolution-agent-design.md

Large diffs are not rendered by default.

52 changes: 47 additions & 5 deletions docs/sdk-evolution-agent.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ The SDK evolution agent is a local dogfood workflow for keeping
agent-runtime-kit aligned with Claude Agent SDK, OpenAI Codex SDK, and Google
Antigravity SDK as those upstream packages evolve.

For the intended architecture, evidence contract, behavior probe strategy,
changelog strategy, caveats, and alternatives, see
[`docs/sdk-evolution-agent-design.md`](sdk-evolution-agent-design.md).

Run it from the repository:

```bash
Expand Down Expand Up @@ -32,6 +36,13 @@ directory is created with private permissions before the Codex runtime starts;
authenticate that Codex home through supported Codex login/API-key/access-token
flows before using it for real Codex-backed runs.

Codex-backed SDK evolution runs explicitly choose `gpt-5.5` with
`reasoning_effort=xhigh` for the AI stages that analyze direction, decide the
update plan, implement allowed changes, and review the result. This model policy
is applied only to `codex-agent-sdk`; Claude and Antigravity runs keep their
provider-native model selection because `gpt-5.5` is not a valid model override
for those adapters.

For Antigravity, local auth can use `GEMINI_API_KEY` / `GOOGLE_API_KEY` or
Google Application Default Credentials. ADC runs use Vertex AI config; provide a
project through ADC, `GOOGLE_CLOUD_PROJECT`, or `GCLOUD_PROJECT`, and optionally
Expand All @@ -46,8 +57,12 @@ Each run writes a timestamped directory under `reports/sdk-evolution/` with:

- `config.json`
- `evidence.json`
- `release_notes.json`
- `api_snapshots/`
- `api_diffs.json`
- `behavior_probes.json`
- `behavior_diffs.json`
- `current_state.json`
- `direction_analysis.json`
- `architecture_decision.json`
- `implementation_summary.json`
Expand All @@ -56,7 +71,8 @@ Each run writes a timestamped directory under `reports/sdk-evolution/` with:
- `report.md`

The report separates deterministic facts from runtime-generated analysis and
calls out uncertainty, recursive self-adaptation impact, implementation status,
calls out uncertainty, release-note coverage, API diffs, behavior diffs,
baseline promotion, recursive self-adaptation impact, implementation status,
test results, reviewer output, and manual review items.

## Upstream Freshness
Expand All @@ -76,10 +92,28 @@ cutoff variables must not hide candidate releases.

## Candidate API Inspection

By default, the command snapshots SDK APIs importable in the current
environment. Use `--inspect-candidates` to install latest candidate SDK versions
in temporary isolated virtualenvs for API snapshots and diffs. This avoids
mutating the project lockfile or working environment.
The command treats `uv.lock` as the current baseline. If the active `.venv`
contains a different installed version, the agent inspects the locked baseline
in a temporary isolated virtualenv instead of trusting the drifted environment.
When a refresh preview is available, package update candidates come from the
resolver's `uv lock --dry-run -P ...` output, not only from PyPI's `latest`
metadata. For each resolver update candidate, the agent installs the target
version in a temporary isolated virtualenv and writes an API snapshot plus
`api_diffs.json` entry. This avoids false downgrade diffs for packages whose
locked prerelease is newer than PyPI's stable latest field. Candidate inspection
is always enabled for update candidates; `--inspect-candidates` remains accepted
only for CLI compatibility.

If `uv lock --dry-run -P ...` reports an SDK update but the run cannot produce a
candidate-version API diff for that package, implementation is blocked and the
architecture decision is marked `manual_design_required`. An empty added /
removed / changed diff is valid; a missing diff object is not.

Behavior probes intentionally separate observed SDK surface churn from adapter
contract breakage. `behavior_probes.json` records fields and parameters seen in
current and candidate packages, while `behavior_diffs.json` compares the
required adapter contract. Optional field changes remain visible in the report
and API diffs, but only breaking adapter-contract diffs block implementation.

## Implementation Gates

Expand All @@ -93,6 +127,9 @@ Implementation is still blocked when:

- the architecture decision sets `manual_design_required`,
- the reviewer rejects the evidence or design,
- a resolver-selected update lacks a candidate API diff,
- required release-note evidence could not be collected,
- candidate behavior probes show a breaking adapter-contract difference,
- required structured output or permission behavior is unsupported by the
selected runtime,
- recursive self-adaptation is required but no safe migration plan exists.
Expand All @@ -114,8 +151,13 @@ python -m examples.sdk_evolution_agent \
--implementation-enabled \
--create-branch \
--branch-name sdk-evolution-update \
--pr-base main \
--draft-pr
```

When `--draft-pr` is set, the agent stages `uv.lock` and the run report
directory, commits them with `--commit-message`, pushes the branch, and opens a
draft PR with `gh`. It never auto-merges.

The command uses local Git and `gh` authentication. It never auto-merges,
auto-publishes, or scrapes unsupported credentials.
Loading
Loading