Skip to content

Commit d0fdc3a

Browse files
quality-playbook v1.1.0: regression test generation (#1190)
* quality-playbook v1.1.0: add regression test generation and startup banner * Regenerate docs/README.skills.md for quality-playbook v1.1.0
1 parent 6cef300 commit d0fdc3a

3 files changed

Lines changed: 68 additions & 3 deletions

File tree

docs/README.skills.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,7 +221,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to
221221
| [publish-to-pages](../skills/publish-to-pages/SKILL.md) | Publish presentations and web content to GitHub Pages. Converts PPTX, PDF, HTML, or Google Slides to a live GitHub Pages URL. Handles repo creation, file conversion, Pages enablement, and returns the live URL. Use when the user wants to publish, deploy, or share a presentation or HTML file via GitHub Pages. | `scripts/convert-pdf.py`<br />`scripts/convert-pptx.py`<br />`scripts/publish.sh` |
222222
| [pytest-coverage](../skills/pytest-coverage/SKILL.md) | Run pytest tests with coverage, discover lines missing coverage, and increase coverage to 100%. | None |
223223
| [python-mcp-server-generator](../skills/python-mcp-server-generator/SKILL.md) | Generate a complete MCP server project in Python with tools, resources, and proper configuration | None |
224-
| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`<br />`references/constitution.md`<br />`references/defensive_patterns.md`<br />`references/functional_tests.md`<br />`references/review_protocols.md`<br />`references/schema_mapping.md`<br />`references/spec_audit.md`<br />`references/verification.md` |
224+
| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`<br />`references/constitution.md`<br />`references/defensive_patterns.md`<br />`references/functional_tests.md`<br />`references/review_protocols.md`<br />`references/schema_mapping.md`<br />`references/spec_audit.md`<br />`references/verification.md` |
225225
| [quasi-coder](../skills/quasi-coder/SKILL.md) | Expert 10x engineer skill for interpreting and implementing code from shorthand, quasi-code, and natural language descriptions. Use when collaborators provide incomplete code snippets, pseudo-code, or descriptions with potential typos or incorrect terminology. Excels at translating non-technical or semi-technical descriptions into production-quality code. | None |
226226
| [readme-blueprint-generator](../skills/readme-blueprint-generator/SKILL.md) | Intelligent README.md generation prompt that analyzes project documentation structure and creates comprehensive repository documentation. Scans .github/copilot directory files and copilot-instructions.md to extract project information, technology stack, architecture, development workflow, coding standards, and testing approaches while generating well-structured markdown documentation with proper formatting, cross-references, and developer-focused content. | None |
227227
| [refactor](../skills/refactor/SKILL.md) | Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements. | None |

skills/quality-playbook/SKILL.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,22 @@
11
---
22
name: quality-playbook
3-
description: 'Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions ''quality playbook'', ''spec audit'', ''Council of Three'', ''fitness-to-purpose'', ''coverage theater'', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase.'
3+
description: "Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase."
44
license: Complete terms in LICENSE.txt
55
metadata:
6-
version: 1.0.0
6+
version: 1.1.0
77
author: Andrew Stellman
88
github: https://github.com/andrewstellman/
99
---
1010

1111
# Quality Playbook Generator
1212

13+
**When this skill starts, display this banner before doing anything else:**
14+
15+
```
16+
Quality Playbook v1.1.0 — by Andrew Stellman
17+
https://github.com/andrewstellman/
18+
```
19+
1320
Generate a complete quality system tailored to a specific codebase. Unlike test stub generators that work mechanically from source code, this skill explores the project first — understanding its domain, architecture, specifications, and failure history — then produces a quality playbook grounded in what it finds.
1421

1522
## Why This Exists
@@ -231,6 +238,8 @@ Key sections: bootstrap files, focus areas mapped to architecture, and these man
231238
- Grep before claiming missing
232239
- Do NOT suggest style changes — only flag things that are incorrect
233240

241+
**Phase 2: Regression tests.** After the review produces BUG findings, write regression tests in `quality/test_regression.*` that reproduce each bug. Each test should fail on the current implementation, confirming the bug is real. Report results as a confirmation table (BUG CONFIRMED / FALSE POSITIVE / NEEDS INVESTIGATION). See `references/review_protocols.md` for the full regression test protocol.
242+
234243
### File 4: `quality/RUN_INTEGRATION_TESTS.md`
235244

236245
**Read `references/review_protocols.md`** for the template.

skills/quality-playbook/references/review_protocols.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,62 @@ For each file reviewed:
5050
- Overall assessment: SHIP IT / FIX FIRST / NEEDS DISCUSSION
5151
```
5252

53+
### Phase 2: Regression Tests for Confirmed Bugs
54+
55+
After the code review produces findings, write regression tests that reproduce each BUG finding. This transforms the review from "here are potential bugs" into "here are proven bugs with failing tests."
56+
57+
**Why this matters:** A code review finding without a reproducer is an opinion. A finding with a failing test is a fact. Across multiple codebases (Go, Rust, Python), regression tests written from code review findings have confirmed bugs at a high rate — including data races, cross-tenant data leaks, state machine violations, and silent context loss. The regression tests also serve as the acceptance criteria for fixing the bugs: when the test passes, the bug is fixed.
58+
59+
**How to generate regression tests:**
60+
61+
1. **For each BUG finding**, write a test that:
62+
- Targets the exact code path and line numbers from the finding
63+
- Fails on the current implementation, confirming the bug exists
64+
- Uses mocking/monkeypatching to isolate from external services
65+
- Includes the finding description in the test docstring for traceability
66+
67+
2. **Name the test file** `quality/test_regression.*` using the project's language:
68+
- Python: `quality/test_regression.py`
69+
- Go: `quality/regression_test.go` (or in the relevant package's test directory)
70+
- Rust: `quality/regression_tests.rs` or a `tests/regression_*.rs` file in the relevant crate
71+
- Java: `quality/RegressionTest.java`
72+
- TypeScript: `quality/regression.test.ts`
73+
74+
3. **Each test should document its origin:**
75+
```
76+
# Python example
77+
def test_webhook_signature_raises_on_malformed_input():
78+
"""[BUG from 2026-03-26-reviewer.md, line 47]
79+
Webhook signature verification raises instead of returning False
80+
on malformed signatures, risking 500 instead of clean 401."""
81+
82+
// Go example
83+
func TestRestart_DataRace_DirectFieldAccess(t *testing.T) {
84+
// BUG from 2026-03-26-claude.md, line 3707
85+
// Restart() writes mutex-protected fields without acquiring the lock
86+
}
87+
```
88+
89+
4. **Run the tests and report results** as a confirmation table:
90+
```
91+
| Finding | Test | Result | Confirmed? |
92+
|---------|------|--------|------------|
93+
| Webhook signature raises on malformed input | test_webhook_signature_... | FAILED (expected) | YES — bug confirmed |
94+
| Queued messages deleted before processing | test_message_queue_... | FAILED (expected) | YES — bug confirmed |
95+
| Thread active check fails open | test_is_thread_active_... | PASSED (unexpected) | NO — needs investigation |
96+
```
97+
98+
5. **If a test passes unexpectedly**, investigate — either the finding was a false positive, or the test doesn't exercise the right code path. Report as NEEDS INVESTIGATION, not as a confirmed bug.
99+
100+
**Language-specific tips:**
101+
102+
- **Go:** Use `go test -race` to confirm data race findings. The race detector is definitive — if it fires, the race is real.
103+
- **Rust:** Use `#[should_panic]` or assert on specific error conditions. For atomicity bugs, assert on cleanup state after injected failures.
104+
- **Python:** Use `monkeypatch` or `unittest.mock.patch` to isolate external dependencies. Use `pytest.raises` for exception-path bugs.
105+
- **Java:** Use Mockito or similar to isolate dependencies. Use `assertThrows` for exception-path bugs.
106+
107+
**Save the regression test output** alongside the code review: if the review is at `quality/code_reviews/2026-03-26-reviewer.md`, the regression tests go in `quality/test_regression.*` and the confirmation results go in the review file as an addendum or in `quality/results/`.
108+
53109
### Why These Guardrails Matter
54110

55111
These four guardrails often improve AI code review quality by reducing vague and hallucinated findings:

0 commit comments

Comments
 (0)