quality-playbook v1.1.0: regression test generation (#1190)

andrewstellman · web-flow · commit d0fdc3a56687 · 2026-03-27T15:51:32.000+11:00
* quality-playbook v1.1.0: add regression test generation and startup banner

* Regenerate docs/README.skills.md for quality-playbook v1.1.0
diff --git a/docs/README.skills.md b/docs/README.skills.md
@@ -221,7 +221,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to
 | [publish-to-pages](../skills/publish-to-pages/SKILL.md) | Publish presentations and web content to GitHub Pages. Converts PPTX, PDF, HTML, or Google Slides to a live GitHub Pages URL. Handles repo creation, file conversion, Pages enablement, and returns the live URL. Use when the user wants to publish, deploy, or share a presentation or HTML file via GitHub Pages. | `scripts/convert-pdf.py`<br />`scripts/convert-pptx.py`<br />`scripts/publish.sh` |
 | [pytest-coverage](../skills/pytest-coverage/SKILL.md) | Run pytest tests with coverage, discover lines missing coverage, and increase coverage to 100%. | None |
 | [python-mcp-server-generator](../skills/python-mcp-server-generator/SKILL.md) | Generate a complete MCP server project in Python with tools, resources, and proper configuration | None |
-| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`<br />`references/constitution.md`<br />`references/defensive_patterns.md`<br />`references/functional_tests.md`<br />`references/review_protocols.md`<br />`references/schema_mapping.md`<br />`references/spec_audit.md`<br />`references/verification.md` |
+| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`<br />`references/constitution.md`<br />`references/defensive_patterns.md`<br />`references/functional_tests.md`<br />`references/review_protocols.md`<br />`references/schema_mapping.md`<br />`references/spec_audit.md`<br />`references/verification.md` |
 | [quasi-coder](../skills/quasi-coder/SKILL.md) | Expert 10x engineer skill for interpreting and implementing code from shorthand, quasi-code, and natural language descriptions. Use when collaborators provide incomplete code snippets, pseudo-code, or descriptions with potential typos or incorrect terminology. Excels at translating non-technical or semi-technical descriptions into production-quality code. | None |
 | [readme-blueprint-generator](../skills/readme-blueprint-generator/SKILL.md) | Intelligent README.md generation prompt that analyzes project documentation structure and creates comprehensive repository documentation. Scans .github/copilot directory files and copilot-instructions.md to extract project information, technology stack, architecture, development workflow, coding standards, and testing approaches while generating well-structured markdown documentation with proper formatting, cross-references, and developer-focused content. | None |
 | [refactor](../skills/refactor/SKILL.md) | Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements. | None |
diff --git a/skills/quality-playbook/SKILL.md b/skills/quality-playbook/SKILL.md
@@ -1,15 +1,22 @@
 ---
 name: quality-playbook
-description: 'Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions ''quality playbook'', ''spec audit'', ''Council of Three'', ''fitness-to-purpose'', ''coverage theater'', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase.'
+description: "Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase."
 license: Complete terms in LICENSE.txt
 metadata:
-  version: 1.0.0
+  version: 1.1.0
   author: Andrew Stellman
   github: https://github.com/andrewstellman/
 ---
 
 # Quality Playbook Generator
 
+**When this skill starts, display this banner before doing anything else:**
+
+```
+Quality Playbook v1.1.0 — by Andrew Stellman
+https://github.com/andrewstellman/
+```
+
 Generate a complete quality system tailored to a specific codebase. Unlike test stub generators that work mechanically from source code, this skill explores the project first — understanding its domain, architecture, specifications, and failure history — then produces a quality playbook grounded in what it finds.
 
 ## Why This Exists
@@ -231,6 +238,8 @@ Key sections: bootstrap files, focus areas mapped to architecture, and these man
 - Grep before claiming missing
 - Do NOT suggest style changes — only flag things that are incorrect
 
+**Phase 2: Regression tests.** After the review produces BUG findings, write regression tests in `quality/test_regression.*` that reproduce each bug. Each test should fail on the current implementation, confirming the bug is real. Report results as a confirmation table (BUG CONFIRMED / FALSE POSITIVE / NEEDS INVESTIGATION). See `references/review_protocols.md` for the full regression test protocol.
+
 ### File 4: `quality/RUN_INTEGRATION_TESTS.md`
 
 **Read `references/review_protocols.md`** for the template.
diff --git a/skills/quality-playbook/references/review_protocols.md b/skills/quality-playbook/references/review_protocols.md
@@ -50,6 +50,62 @@ For each file reviewed:
 - Overall assessment: SHIP IT / FIX FIRST / NEEDS DISCUSSION
 ```
 
+### Phase 2: Regression Tests for Confirmed Bugs
+
+After the code review produces findings, write regression tests that reproduce each BUG finding. This transforms the review from "here are potential bugs" into "here are proven bugs with failing tests."
+
+**Why this matters:** A code review finding without a reproducer is an opinion. A finding with a failing test is a fact. Across multiple codebases (Go, Rust, Python), regression tests written from code review findings have confirmed bugs at a high rate — including data races, cross-tenant data leaks, state machine violations, and silent context loss. The regression tests also serve as the acceptance criteria for fixing the bugs: when the test passes, the bug is fixed.
+
+**How to generate regression tests:**
+
+1. **For each BUG finding**, write a test that:
+   - Targets the exact code path and line numbers from the finding
+   - Fails on the current implementation, confirming the bug exists
+   - Uses mocking/monkeypatching to isolate from external services
+   - Includes the finding description in the test docstring for traceability
+
+2. **Name the test file** `quality/test_regression.*` using the project's language:
+   - Python: `quality/test_regression.py`
+   - Go: `quality/regression_test.go` (or in the relevant package's test directory)
+   - Rust: `quality/regression_tests.rs` or a `tests/regression_*.rs` file in the relevant crate
+   - Java: `quality/RegressionTest.java`
+   - TypeScript: `quality/regression.test.ts`
+
+3. **Each test should document its origin:**
+   ```
+   # Python example
+   def test_webhook_signature_raises_on_malformed_input():
+       """[BUG from 2026-03-26-reviewer.md, line 47]
+       Webhook signature verification raises instead of returning False
+       on malformed signatures, risking 500 instead of clean 401."""
+
+   // Go example
+   func TestRestart_DataRace_DirectFieldAccess(t *testing.T) {
+       // BUG from 2026-03-26-claude.md, line 3707
+       // Restart() writes mutex-protected fields without acquiring the lock
+   }
+   ```
+
+4. **Run the tests and report results** as a confirmation table:
+   ```
+   | Finding | Test | Result | Confirmed? |
+   |---------|------|--------|------------|
+   | Webhook signature raises on malformed input | test_webhook_signature_... | FAILED (expected) | YES — bug confirmed |
+   | Queued messages deleted before processing | test_message_queue_... | FAILED (expected) | YES — bug confirmed |
+   | Thread active check fails open | test_is_thread_active_... | PASSED (unexpected) | NO — needs investigation |
+   ```
+
+5. **If a test passes unexpectedly**, investigate — either the finding was a false positive, or the test doesn't exercise the right code path. Report as NEEDS INVESTIGATION, not as a confirmed bug.
+
+**Language-specific tips:**
+
+- **Go:** Use `go test -race` to confirm data race findings. The race detector is definitive — if it fires, the race is real.
+- **Rust:** Use `#[should_panic]` or assert on specific error conditions. For atomicity bugs, assert on cleanup state after injected failures.
+- **Python:** Use `monkeypatch` or `unittest.mock.patch` to isolate external dependencies. Use `pytest.raises` for exception-path bugs.
+- **Java:** Use Mockito or similar to isolate dependencies. Use `assertThrows` for exception-path bugs.
+
+**Save the regression test output** alongside the code review: if the review is at `quality/code_reviews/2026-03-26-reviewer.md`, the regression tests go in `quality/test_regression.*` and the confirmation results go in the review file as an addendum or in `quality/results/`.
+
 ### Why These Guardrails Matter
 
 These four guardrails often improve AI code review quality by reducing vague and hallucinated findings: