You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/README.skills.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -221,7 +221,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to
221
221
|[publish-to-pages](../skills/publish-to-pages/SKILL.md)| Publish presentations and web content to GitHub Pages. Converts PPTX, PDF, HTML, or Google Slides to a live GitHub Pages URL. Handles repo creation, file conversion, Pages enablement, and returns the live URL. Use when the user wants to publish, deploy, or share a presentation or HTML file via GitHub Pages. |`scripts/convert-pdf.py`<br />`scripts/convert-pptx.py`<br />`scripts/publish.sh`|
222
222
|[pytest-coverage](../skills/pytest-coverage/SKILL.md)| Run pytest tests with coverage, discover lines missing coverage, and increase coverage to 100%. | None |
223
223
|[python-mcp-server-generator](../skills/python-mcp-server-generator/SKILL.md)| Generate a complete MCP server project in Python with tools, resources, and proper configuration | None |
224
-
| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`<br />`references/constitution.md`<br />`references/defensive_patterns.md`<br />`references/functional_tests.md`<br />`references/review_protocols.md`<br />`references/schema_mapping.md`<br />`references/spec_audit.md`<br />`references/verification.md` |
224
+
| [quality-playbook](../skills/quality-playbook/SKILL.md) | Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase. | `LICENSE.txt`<br />`references/constitution.md`<br />`references/defensive_patterns.md`<br />`references/functional_tests.md`<br />`references/review_protocols.md`<br />`references/schema_mapping.md`<br />`references/spec_audit.md`<br />`references/verification.md` |
225
225
|[quasi-coder](../skills/quasi-coder/SKILL.md)| Expert 10x engineer skill for interpreting and implementing code from shorthand, quasi-code, and natural language descriptions. Use when collaborators provide incomplete code snippets, pseudo-code, or descriptions with potential typos or incorrect terminology. Excels at translating non-technical or semi-technical descriptions into production-quality code. | None |
226
226
|[readme-blueprint-generator](../skills/readme-blueprint-generator/SKILL.md)| Intelligent README.md generation prompt that analyzes project documentation structure and creates comprehensive repository documentation. Scans .github/copilot directory files and copilot-instructions.md to extract project information, technology stack, architecture, development workflow, coding standards, and testing approaches while generating well-structured markdown documentation with proper formatting, cross-references, and developer-focused content. | None |
227
227
|[refactor](../skills/refactor/SKILL.md)| Surgical code refactoring to improve maintainability without changing behavior. Covers extracting functions, renaming variables, breaking down god functions, improving type safety, eliminating code smells, and applying design patterns. Less drastic than repo-rebuilder; use for gradual improvements. | None |
Copy file name to clipboardExpand all lines: skills/quality-playbook/SKILL.md
+11-2Lines changed: 11 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,15 +1,22 @@
1
1
---
2
2
name: quality-playbook
3
-
description: 'Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions ''quality playbook'', ''spec audit'', ''Council of Three'', ''fitness-to-purpose'', ''coverage theater'', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase.'
3
+
description: "Explore any codebase from scratch and generate six quality artifacts: a quality constitution (QUALITY.md), spec-traced functional tests, a code review protocol with regression test generation, an integration testing protocol, a multi-model spec audit (Council of Three), and an AI bootstrap file (AGENTS.md). Works with any language (Python, Java, Scala, TypeScript, Go, Rust, etc.). Use this skill whenever the user asks to set up a quality playbook, generate functional tests from specifications, create a quality constitution, build testing protocols, audit code against specs, or establish a repeatable quality system for a project. Also trigger when the user mentions 'quality playbook', 'spec audit', 'Council of Three', 'fitness-to-purpose', 'coverage theater', or wants to go beyond basic test generation to build a full quality system grounded in their actual codebase."
4
4
license: Complete terms in LICENSE.txt
5
5
metadata:
6
-
version: 1.0.0
6
+
version: 1.1.0
7
7
author: Andrew Stellman
8
8
github: https://github.com/andrewstellman/
9
9
---
10
10
11
11
# Quality Playbook Generator
12
12
13
+
**When this skill starts, display this banner before doing anything else:**
14
+
15
+
```
16
+
Quality Playbook v1.1.0 — by Andrew Stellman
17
+
https://github.com/andrewstellman/
18
+
```
19
+
13
20
Generate a complete quality system tailored to a specific codebase. Unlike test stub generators that work mechanically from source code, this skill explores the project first — understanding its domain, architecture, specifications, and failure history — then produces a quality playbook grounded in what it finds.
14
21
15
22
## Why This Exists
@@ -231,6 +238,8 @@ Key sections: bootstrap files, focus areas mapped to architecture, and these man
231
238
- Grep before claiming missing
232
239
- Do NOT suggest style changes — only flag things that are incorrect
233
240
241
+
**Phase 2: Regression tests.** After the review produces BUG findings, write regression tests in `quality/test_regression.*` that reproduce each bug. Each test should fail on the current implementation, confirming the bug is real. Report results as a confirmation table (BUG CONFIRMED / FALSE POSITIVE / NEEDS INVESTIGATION). See `references/review_protocols.md` for the full regression test protocol.
242
+
234
243
### File 4: `quality/RUN_INTEGRATION_TESTS.md`
235
244
236
245
**Read `references/review_protocols.md`** for the template.
Copy file name to clipboardExpand all lines: skills/quality-playbook/references/review_protocols.md
+56Lines changed: 56 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,6 +50,62 @@ For each file reviewed:
50
50
- Overall assessment: SHIP IT / FIX FIRST / NEEDS DISCUSSION
51
51
```
52
52
53
+
### Phase 2: Regression Tests for Confirmed Bugs
54
+
55
+
After the code review produces findings, write regression tests that reproduce each BUG finding. This transforms the review from "here are potential bugs" into "here are proven bugs with failing tests."
56
+
57
+
**Why this matters:** A code review finding without a reproducer is an opinion. A finding with a failing test is a fact. Across multiple codebases (Go, Rust, Python), regression tests written from code review findings have confirmed bugs at a high rate — including data races, cross-tenant data leaks, state machine violations, and silent context loss. The regression tests also serve as the acceptance criteria for fixing the bugs: when the test passes, the bug is fixed.
58
+
59
+
**How to generate regression tests:**
60
+
61
+
1.**For each BUG finding**, write a test that:
62
+
- Targets the exact code path and line numbers from the finding
63
+
- Fails on the current implementation, confirming the bug exists
64
+
- Uses mocking/monkeypatching to isolate from external services
65
+
- Includes the finding description in the test docstring for traceability
66
+
67
+
2.**Name the test file**`quality/test_regression.*` using the project's language:
68
+
- Python: `quality/test_regression.py`
69
+
- Go: `quality/regression_test.go` (or in the relevant package's test directory)
70
+
- Rust: `quality/regression_tests.rs` or a `tests/regression_*.rs` file in the relevant crate
| Thread active check fails open | test_is_thread_active_... | PASSED (unexpected) | NO — needs investigation |
96
+
```
97
+
98
+
5.**If a test passes unexpectedly**, investigate — either the finding was a false positive, or the test doesn't exercise the right code path. Report as NEEDS INVESTIGATION, not as a confirmed bug.
99
+
100
+
**Language-specific tips:**
101
+
102
+
-**Go:** Use `go test -race` to confirm data race findings. The race detector is definitive — if it fires, the race is real.
103
+
-**Rust:** Use `#[should_panic]` or assert on specific error conditions. For atomicity bugs, assert on cleanup state after injected failures.
104
+
-**Python:** Use `monkeypatch` or `unittest.mock.patch` to isolate external dependencies. Use `pytest.raises` for exception-path bugs.
105
+
-**Java:** Use Mockito or similar to isolate dependencies. Use `assertThrows` for exception-path bugs.
106
+
107
+
**Save the regression test output** alongside the code review: if the review is at `quality/code_reviews/2026-03-26-reviewer.md`, the regression tests go in `quality/test_regression.*` and the confirmation results go in the review file as an addendum or in `quality/results/`.
108
+
53
109
### Why These Guardrails Matter
54
110
55
111
These four guardrails often improve AI code review quality by reducing vague and hallucinated findings:
0 commit comments