feat(agentic-ci): decision-ready triage and daily PR fixes#600
feat(agentic-ci): decision-ready triage and daily PR fixes#600andreatgretel wants to merge 12 commits intomainfrom
Conversation
PR #600 Review —
|
Greptile SummaryThis PR reorganizes the weekly issue-triage report around action buckets and promotes four daily audit suites from read-only to autonomous PR-opening by introducing a shared
|
| Filename | Overview |
|---|---|
| .agents/recipes/_fix-policy.md | New policy document defining the localized-fix bar, path/command allowlists, finding-hash spec, ranking criteria, standard fix procedure, PR conventions, and the two-strike / attempted_fixes schema — including the previously-flagged eviction fix (two-strike exempt from FIFO cap). |
| .github/workflows/agentic-ci-daily.yml | Splits the single Claude invocation into an audit + optional fix phase; adds snapshot/backlog/scope-gate/lockfile-gate steps with sound transitive guards. One logic issue: lockfile_gate's git checkout can carry a prior iteration's tree into subsequent entries on fetch failure. |
| .agents/recipes/code-quality/recipe.md | Adds a fix phase for bare-except narrowing only; draft mode enforced; TODO-line deletion explicitly forbidden; eligible grep pattern expanded to catch except BaseException. |
| .agents/recipes/dependencies/recipe.md | Adds a fix phase for transitive-gap and unused categories with make install-dev pre-test; specifier-copy-from-sibling guard prevents hallucinated version ranges. |
| .agents/recipes/issue-triage/recipe.md | Full rewrite to action-bucket format; adds multi-part split with numbered files and reconciliation logic; surfaces two-strike findings from daily suites; previous multi-part fallback issues addressed. |
| .github/workflows/agentic-ci-issue-triage.yml | Fallback step rewritten to use numbered part files, identity-based seen-parts detection, per-index test()-guard before capture(), and post-only-missing logic — addressing all previously flagged comment-deduplication bugs. |
Sequence Diagram
sequenceDiagram
participant W as Workflow
participant A as Claude (audit)
participant F as Claude (fix)
participant SG as scope_gate
participant LG as lockfile_gate
participant GH as GitHub API
W->>A: audit phase (--max-turns 50)
A-->>W: runner-state.json (fix_backlog updated)
W->>W: check fix_backlog size
alt backlog empty or test-health
W-->>W: skip fix (report-only)
else backlog non-empty
W->>W: snapshot attempted_fixes
W->>F: fix phase (--max-turns 50)
F->>F: reconcile orphaned PRs
F->>F: rank + re-verify top candidate
F->>GH: git push branch + gh pr create
F-->>W: runner-state.json attempted_fixes open
W->>SG: validate allowlist + LOC cap + AST
alt violation
SG->>GH: gh pr close --delete-branch
SG->>W: flip attempted_fixes to abandoned
end
alt "suite == dependencies"
W->>LG: make install-dev against pushed branch
alt install-dev fails
LG->>GH: gh pr close
LG->>W: flip attempted_fixes to abandoned
end
end
end
W->>W: Update runner memory + upload artifacts
Comments Outside Diff (1)
-
.github/workflows/agentic-ci-daily.yml, line 1378-1385 (link)git checkoutcontaminates subsequent loop iterations on fetch failureWhen crash-recovery reconciliation back-fills multiple
attempted_fixesentries withoutcome: "open", the loop may process two or more entries. After a successful checkout for entry N, if entry N+1'sgit fetchsilently fails (|| true) and itsgit checkoutalso fails,make install-devruns against entry N's already-checked-out tree instead of entry N+1's. The lockfile check effectively validates entry N's changes again while producing a passing result attributed to entry N+1, leaving entry N+1'spyproject.tomlchanges unverified.Restoring to the original detached
HEADat the top of each loop iteration (before the per-entry fetch/checkout) would prevent the carry-over.Prompt To Fix With AI
This is a comment left during a code review. Path: .github/workflows/agentic-ci-daily.yml Line: 1378-1385 Comment: **`git checkout` contaminates subsequent loop iterations on fetch failure** When crash-recovery reconciliation back-fills multiple `attempted_fixes` entries with `outcome: "open"`, the loop may process two or more entries. After a successful checkout for entry N, if entry N+1's `git fetch` silently fails (`|| true`) and its `git checkout` also fails, `make install-dev` runs against entry N's already-checked-out tree instead of entry N+1's. The lockfile check effectively validates entry N's changes again while producing a passing result attributed to entry N+1, leaving entry N+1's `pyproject.toml` changes unverified. Restoring to the original detached `HEAD` at the top of each loop iteration (before the per-entry fetch/checkout) would prevent the carry-over. How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 1
.github/workflows/agentic-ci-daily.yml:1378-1385
**`git checkout` contaminates subsequent loop iterations on fetch failure**
When crash-recovery reconciliation back-fills multiple `attempted_fixes` entries with `outcome: "open"`, the loop may process two or more entries. After a successful checkout for entry N, if entry N+1's `git fetch` silently fails (`|| true`) and its `git checkout` also fails, `make install-dev` runs against entry N's already-checked-out tree instead of entry N+1's. The lockfile check effectively validates entry N's changes again while producing a passing result attributed to entry N+1, leaving entry N+1's `pyproject.toml` changes unverified.
Restoring to the original detached `HEAD` at the top of each loop iteration (before the per-entry fetch/checkout) would prevent the carry-over.
Reviews (11): Last reviewed commit: "fix(agentic-ci): validate all grown fix ..." | Re-trigger Greptile
|
Thanks for putting this together, @andreatgretel — the iteration is visible, and the factoring of SummaryThis reorganizes the weekly triage report around action buckets and flips four daily suites from report-only to opening one PR per run, with a shared policy doc covering selection, ranking, allowlists, two-strike escalation, and atomicity. The implementation matches the stated intent and reflects multiple rounds of review feedback (the partial-post detection and I read this through the lens you asked for — humans-out-of-the-loop autonomous PR generation — and the bulk of my comments are about the trust boundary between policy and enforcement. Most of the safety bar lives in FindingsWarnings — Worth addressing
Suggestions — Take it or leave it
Minor — JSON formatting consistency
What Looks Good
VerdictNeeds changes — Five Warnings worth addressing before this runs autonomously:
The Suggestions are honestly nits — none of them block. This review was generated by an AI assistant. |
Reorganize the weekly issue-triage report around recommended actions (close as resolved, close as duplicate, needs maintainer decision, ready for assignment, stuck PR, duplicate PRs, stale) so each flagged item carries action + evidence + rationale and can be resolved without opening it. Multi-comment split with i/N markers and orphan reconciliation when the report grows or shrinks. Flip the four daily audit suites with mechanical fix categories from read-only reports to opening one PR per run: - docs-and-references: broken-link, docstring-drift, arch-ref-rename - structure: missing-future, lazy-import - dependencies: transitive-gap, unused - code-quality: bare-except (draft until landing rate proven) test-health stays report-only (all candidates require inferring intent). The shared procedure - fix_backlog selection, finding-hash spec for stable cross-run identification, attempted_fixes lifecycle with two-strike escalation, allowlists, ranking, branch/PR conventions - lives in .agents/recipes/_fix-policy.md. Each suite recipe declares only its eligible categories, branch types, and test requirements. Workflow runs claude twice per suite (audit, then conditionally fix), each capped at the existing --max-turns 50. Fix call is gated on non-empty fix_backlog and skipped entirely for test-health.
- Map per-package test targets explicitly in _fix-policy.md (Makefile exposes test-config/test-engine/test-interface, not test-<package>). - Use github-actions[bot] noreply identity for commits the recipes produce. - Refresh fix_backlog.data when an id already exists so the fix phase cannot drive a PR from stale data after the underlying file changed. - Stop time-pruning closed/abandoned attempted_fixes entries — pruning before the two-strike threshold erases the history needed to escalate. Single-strike entries now age out only via the 200-entry cap. - Disambiguate bare-except findings within the same function by including a try-body hash in the finding id. - Audit grep for code-quality now matches both `except:` and `except BaseException:`, in parity with the fix eligibility. - Restrict transitive-gap fix eligibility to cases where a sibling package already declares the dep (avoids inventing version specifiers from scratch). - Issue-triage workflow handles multi-part reports in both the fallback post step and the job summary; recipe always writes numbered parts.
- Replace remaining `make test-<package>` references with pointers to the mapping table; only the table itself uses that placeholder now. - Fix `gh api --paginate | jq | length` returning per-page counts: slurp with `jq -s 'add // 0'` to get a single total. - Compare posted-comment count to expected part count so a partial post (agent posted part 1 but not 2/3) triggers the fallback instead of being silently treated as success. - Add `shell: bash` to triage steps using `shopt`/`mapfile` so they're not at the mercy of the runner's default shell. - Disambiguate bare-except findings whose try-body hashes collide by adding a per-function ordinal to the canonical_key. - Tie the 200-entry attempted_fixes cap eviction to `attempts[0].at` (the schema has no `first_seen` field).
…back Replace the count-only POSTED_COUNT >= EXPECTED_PARTS check with an identity-based check that extracts every i/N marker seen in today-dated bot comments and verifies each expected i is present. A duplicate post of one part can no longer mask a missing other.
- Exempt two-strike attempted_fixes entries from the 200-entry cap eviction. Cap now evicts non-two-strike oldest-first by attempts[0].at; two-strike entries are silently-forgotten only in the pathological all-200-are-two-strike case (itself a signal). - Specify the attempted_fixes PR-marker reconciliation algorithm: scan open PR bodies for the `<!-- agentic-ci finding=<id> -->` marker and back-fill missing entries. - Tighten the daily workflow conditionals to gate on explicit step outcomes (steps.audit.outcome == 'success' rather than success()) so a future pre-audit gate cannot accidentally trip the fix step.
…ording) - Bump daily-suite job timeout from 20 to 40 minutes. The split into two sequential `claude --max-turns 50` invocations can saturate a 20-minute budget; a mid-fix SIGTERM would leave an orphaned branch and inconsistent runner-state. - Disambiguate the `_phase-fix.md` "do NOT re-scan" rule. It forbids rebuilding fix_backlog from scratch but does NOT override the per-candidate re-verification step required by _fix-policy.md step 4.1 (re-grep / re-read the specific file the candidate points at). Single-candidate re-verification is required; whole-codebase re-scanning is forbidden.
- Guard `jq capture()` with a `test()` select. `capture()` errors on non-match instead of returning empty, which would truncate SEEN_PARTS if any unrelated today-dated bot comment lacks the triage marker (e.g. from a sibling workflow). Adding the test() guard ensures capture() only runs on bodies that already match. - Iterate the MISSING[] array when posting fallback parts, not the full PARTS[] array. Posting all parts when only some were missing was creating duplicate comments for the parts the agent already successfully posted.
Address the five Warnings from the 2026-05-07 review focused on the
trust boundary for autonomous PR generation. Five workflow/policy
adjustments shrink the surface where agent compliance is load-bearing:
- Workflow-level scope gate. After the fix step, re-derive the diff
against `origin/main` and validate against the per-suite path
allowlist (regex mirrored from `_fix-policy.md`), the 50-LOC cap, and
the 3-file cap. On violation, close the PR with `--delete-branch`
and flip the `attempted_fixes` entry from `open` to `abandoned` so
two-strike logic still sees the failure. The recipe alone could not
bind the agent's path choices; the workflow now does.
- Dependencies install-dev verification. For the dependencies suite
only, re-run `make install-dev` after the scope gate so the agent's
pyproject edit is exercised against the lockfile resolver. Closes
the PR if `install-dev` fails — catches the failure mode where the
per-package test target passed against the old cached lockfile.
- Flip matrix-job `cancel-in-progress` from true to false. A
cancellation between the agent's git push and `gh pr create` would
leave an orphaned branch with no `attempted_fixes` record;
reconciliation only covers PRs that were opened. Queueing a
duplicate run is the lesser evil. `_fix-policy.md` Atomicity
section now documents the trade-off.
- Allow `/tmp/audit-{{suite}}.md` in `_phase-audit.md`'s "do not
modify outside `{{memory_path}}/`" directive. A literal-minded
agent could refuse to write the report file, which would break the
job summary, artifact upload, and the fix phase's audit context.
- Always upload the agent log artifact (was `if: failure()` only) and
include `runner-state.json`. For autonomous mode, the most
interesting failure is "the workflow succeeded but the PR was
wrong"; the stream-json log is the only way to look back days
later.
Also takes johnnygreco's Suggestion 2: spell out in the policy doc
that the `draft_until_proven` flip is the sole human-gated
promotion step in the fix policy and must not be automated.
Greptile and the github-actions auto-reviewer's findings were
already closed in the prior pass-2/pass-3 commits; no action needed
on those.
91e8749 to
23829fb
Compare
Codex flagged five issues in the prior commit's scope/lockfile gates.
This commit closes all five:
- HIGH: Wrong-PR targeting. Both gates selected the last globally-open
attempted_fixes entry, which could match a stale orphan from a
prior crashed run rather than the PR opened by *this* run. Adds a
pre-fix snapshot step that captures `(id, attempts-length)` pairs
before the fix runs, and changes the post-fix selectors to require
that the entry's attempts count grew during this run.
- HIGH: Docstring-only enforcement gap on the docs-and-references
suite. The .py path allowlist was at workflow level but the
docstring-only caveat was still policy-only. Adds an AST-based
check: for each .py file changed, parse the post-change tree,
collect docstring line ranges (module/class/function), then verify
every added line in the diff is either inside a docstring, a
comment, or whitespace. Verified locally with both pass and fail
fixtures.
- MEDIUM: Diff-ref mismatch. Gates diffed `origin/main...HEAD` rather
than `origin/main...origin/$BRANCH`, so a misbehaving agent that
left HEAD pointing elsewhere would have validated the wrong tree.
Now fetches `origin/$BRANCH` first and prefers that ref. Falls
back to HEAD only if fetch fails (with a warning).
- MEDIUM: FILE_COUNT bug. `grep -c '.' || echo 0` produced "0\n0" on
empty diff, breaking the downstream integer comparison. Replaces
with `mapfile -t FILE_ARR` + `${#FILE_ARR[@]}`, which is correct
for any input including empty.
- LOW: Non-atomic JSON writes. The runner-state mutations could leave
the file half-written if the workflow was cancelled mid-write.
Switches both gates to the temp-file + os.replace pattern.
Also: dependencies-lockfile gate now does an explicit
`git checkout --detach origin/$BRANCH` before re-running install-dev,
so verification runs against what was actually pushed rather than
relying on local working-tree state.
Greptile review on 872d561 flagged that the fix step's custom `if:` expression bypasses GitHub Actions' implicit success() check. Without explicitly referencing steps.snapshot.outcome, a snapshot failure (corrupt runner-state, disk error) would let the fix step run anyway. The scope gate's `jq --slurpfile prior /tmp/prior-attempted-fixes.json` would then exit non-zero on the missing file, leave OPEN empty, and hit the "nothing to validate" early-exit — silently approving whatever the agent pushed. Adds steps.snapshot.outcome == 'success' to both the fix step's condition (the actual fix) and the scope_gate step's condition (belt-and-suspenders against future refactors).
Signed-off-by: Andre Manoel <amanoel@nvidia.com>
📋 Summary
Reorganize the weekly issue-triage report around recommended actions so each flagged item is decision-ready, and flip four of the five daily audit suites from read-only reports to opening one PR per run for the most important localized fix. The shared procedure (selection, ranking, allowlists, attempted-fixes lifecycle, two-strike escalation, branch/PR conventions) lives in a single
_fix-policy.md; suite recipes declare only their eligible categories.🔗 Related Issue
N/A — extends the agentic-CI work tracked in plan 472. We can link a follow-up tracking issue once one is opened.
🔄 Changes
.agents/recipes/_fix-policy.md— universal localized-fix bar (≤3 files, ≤50 LOC, reversible, self-evident, test-safe, single-concern), per-suite path/command allowlists, finding-hash spec for stable cross-run identification,fix_backlog/attempted_fixesschema, ranking criteria (confidence > severity > impact > recency), draft-PR mode, two-strike escalation, standard fix procedure, directgh pr create --body-filePR-creation pattern..agents/recipes/_phase-audit.mdand.agents/recipes/_phase-fix.md— phase directives prepended to eachclaudeinvocation so each call knows which phase it executes..agents/recipes/issue-triage/recipe.md: action-organized buckets (Close as resolved,Close as duplicate,Needs maintainer decision,Ready for assignment,Stuck PR,Duplicate PRs,Stale, consider closing), per-row Action / Evidence / Rationale columns, healthy items collapsed to count +<details>, multi-comment split with:i/Nmarkers and orphan reconciliation, plus aRepeatedly-failed fix attemptssection that surfaces two-strike findings from the daily suites._fix-policy.md):docs-and-references: broken-link, docstring-drift (signature-driven), arch-ref-rename. Non-draft.structure: missing-future, lazy-import. Non-draft. Dead exports stay report-only.dependencies: transitive-gap, unused. Non-draft.code-quality: bare-except narrowing only. Draft PRs untildraft_until_provenis flipped after two non-draft PRs land clean. TODO-line deletion explicitly forbidden.test-health: explicit "no fix phase, all categories report-only" with a future-candidate note for test-isolation violations..github/workflows/agentic-ci-daily.yml: split the recipe step into twoclaudeinvocations (audit, then conditionally fix), each with the existing--max-turns 50. Fix step gated on audit success ANDmatrix.suite != 'test-health'AND non-emptyfix_backlog. Adds a git-identity step, expands artifact upload to include the fix log + PR body, and reports both phase outcomes in the job summary. No branch auto-deletion._runner.mdupdates: generalized branch prefix beyondchore, documented why CI usesgh pr create --body-fileinstead of/create-pr.🔍 Attention Areas
.agents/recipes/_fix-policy.md— load-bearing contract. Allowlists, ranking, two-strike escalation, and the standard fix procedure all live here. Worth reviewing in full..github/workflows/agentic-ci-daily.yml— the audit/fix split, the backlog gate (fromJSON(steps.backlog.outputs.size || '0') > 0), and thematrix.suite != 'test-health'guard.agentic-ci,agentic-ci/docs-and-references,agentic-ci/structure,agentic-ci/dependencies,agentic-ci/code-quality. The recipes apply these viagh pr edit --add-label; missing labels won't block the PR opening but will surface aghwarning.🧪 Testing
make test— N/A. No Python or other source code changed; the diff is recipes (markdown), workflow YAML, and one new policy doc.python3 -c "import yaml; yaml.safe_load(...)")workflow_dispatchone at a time and read the first 1–2 actual runs before promoting to the next, per the validation section of the plan. Two-strike escalation, allowlist enforcement, and re-attempt blocking are all observable from the runner state and PR list.✅ Checklist
_fix-policy.mditself is the architecture doc for the fix phase.