Skip to content

Commit 1b5f891

Browse files
committed
docs(spec): add v2.5 for rate-limit protections
1 parent 64c865c commit 1b5f891

2 files changed

Lines changed: 867 additions & 0 deletions

File tree

docs/spec/SPEC-v2.5.md

Lines changed: 340 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,340 @@
1+
# github-api-usage-monitor v2 - Specification
2+
3+
> **This document is derived from `spec/spec.v2.json`.** Do not edit directly; regenerate from the authoritative spec.
4+
5+
**Status:** implemented
6+
**Canonical Date:** 2026-02-02
7+
**Spec Version:** 2.0
8+
9+
---
10+
11+
## Summary
12+
13+
A GitHub Action that monitors API rate-limit usage during a job using pre/post hooks. A detached poller performs adaptive /rate_limit sampling, then the post hook renders a summary and can upload diagnostics artifacts (state.json + poll-log.json) when enabled.
14+
15+
---
16+
17+
## Scope Boundary
18+
19+
### In Scope
20+
21+
- Bucket-level accounting of primary rate-limit usage during a job (hour- and minute-buckets)
22+
- Adaptive polling that targets reset boundaries with a debounce floor
23+
- Single-step user experience via pre/post hooks (no explicit start/stop steps)
24+
- Clear, actionable output: per-bucket totals, warnings, remaining quota, next reset time
25+
- Optional diagnostics artifacts uploaded on post (state.json and poll-log.json) when enabled
26+
- Linux/macOS GitHub-hosted runners
27+
28+
### Out of Scope
29+
30+
- Per-request endpoint tracing (URL/method)
31+
- Per-step attribution
32+
- Advanced secondary rate-limit diagnostics (burstiness, concurrency, abuse heuristics)
33+
- Strong guarantees on job cancellation/runner crash (partial results acceptable)
34+
- Windows support
35+
- Job-level container support
36+
- Self-hosted runner and GHES support
37+
38+
---
39+
40+
## Steel Thread Acceptance Criteria
41+
42+
1. Pre hook spawns a background poller that persists across workflow steps
43+
2. Poller uses adaptive scheduling (base interval 30s, debounce floor) and updates reducer state
44+
3. Poller respects 403/429 rate-limit responses with backoff and retry limits
45+
4. Post hook terminates poller and produces summary in step summary + console
46+
5. Post hook uploads diagnostics artifacts containing state.json and poll-log.json when diagnostics are enabled
47+
6. Warnings are emitted for poll failures, anomalies, secondary rate limits, and unsupported environments
48+
7. Token is never printed to logs
49+
50+
---
51+
52+
## Functional Requirements
53+
54+
| ID | Priority | Requirement | Notes |
55+
|----|----------|-------------|-------|
56+
| F1 | must | Pre hook validates platform and token, creates baseline state, and spawns detached poller | Initial /rate_limit poll establishes baseline and fails fast on invalid token |
57+
| F2 | must | Post hook terminates poller and prints summary even if poller not running | Best-effort output; run with post-if: always() |
58+
| F3 | must | Accept token input; default to github.token / GITHUB_TOKEN if present | Never print token; use GitHub masking |
59+
| F4 | must | Poll /rate_limit using adaptive scheduling with a 30s base interval and a debounce floor | Targets pre-reset windows; burst polling near reset boundaries |
60+
| F5 | must | Persist state and PID to $RUNNER_TEMP for cross-step access | state.json in $RUNNER_TEMP/github-api-usage-monitor; PID at poller.pid |
61+
| F6 | must | Track all rate-limit buckets returned by /rate_limit API | Report all buckets with usage in summary |
62+
| F7 | must | Handle reset boundaries by including used count immediately after reset change | Minimizes undercount; post-reset used reflects consumption since new window |
63+
| F8 | must | Detect anomalies when used decreases without reset change | Increment anomaly counter; do not subtract from totals; emit warning |
64+
| F9 | should | Periodically write state file for durability during long-running polls | Best-effort data preservation on unexpected termination |
65+
| F10 | should | Output summary to GitHub step summary and console | Table sorted by total_used desc; one-line console summary + top buckets |
66+
| F11 | should | When diagnostics are enabled, write per-poll JSONL diagnostics log to poll-log.jsonl | Best-effort; do not disrupt poller on write errors |
67+
| F12 | must | When diagnostics are enabled, upload diagnostics artifacts in post hook | state.json and poll-log.json uploaded as a single artifact |
68+
| F13 | should | Allow override of artifact name | Optional input for matrix jobs to avoid collisions |
69+
| F14 | must | Allow diagnostics to be enabled or disabled via input | Default off; when disabled, skip poll log and artifact upload |
70+
| F15 | must | On 403/429 responses, back off per primary/secondary rules and stop after 5 consecutive secondary retries | Primary: remaining=0 -> wait until x-ratelimit-reset. Secondary: message contains secondary/abuse; honor retry-after and reset; default 60s; exponential backoff. |
71+
72+
---
73+
74+
## Non-Functional Requirements
75+
76+
| ID | Category | Requirement | Measurement |
77+
|----|----------|-------------|-------------|
78+
| NF1 | security | No secrets in logs | Token never appears in stdout/stderr; no set -x; use GitHub masking |
79+
| NF2 | reliability | Poller process survives step boundaries on Linux/macOS GitHub-hosted runners | Detached process with unref(); PID-based lifecycle management |
80+
| NF3 | performance | Constant-space reducer with O(#buckets) per poll | State file remains small; minimal log volume |
81+
| NF4 | maintainability | Deterministic reducer behavior with unit-testable pure functions | Table-driven tests for all reducer edge cases |
82+
| NF5 | operational_safety | Poller has a maximum lifetime to prevent runaway processes | Exits after MAX_LIFETIME_MS with state write |
83+
84+
---
85+
86+
## Architecture
87+
88+
### Layer Diagram
89+
90+
```mermaid
91+
flowchart TB
92+
subgraph action["Action Entry Layer"]
93+
pre["pre.ts"]
94+
main["main.ts"]
95+
post["post.ts"]
96+
start["start.ts"]
97+
end
98+
99+
subgraph poller_layer["Poller Layer"]
100+
poller["poller.ts"]
101+
poller_entry["poller-entry.ts"]
102+
end
103+
104+
subgraph core["Core Logic Layer"]
105+
reducer["reducer.ts"]
106+
state["state.ts"]
107+
poll_log["poll-log.ts"]
108+
end
109+
110+
subgraph infra["Infrastructure Layer"]
111+
github["github.ts"]
112+
output["output.ts"]
113+
paths["paths.ts"]
114+
platform["platform.ts"]
115+
end
116+
117+
pre --> start
118+
start --> poller
119+
post --> poller
120+
post --> output
121+
poller_entry --> poller
122+
poller --> github
123+
poller --> reducer
124+
poller --> state
125+
poller --> poll_log
126+
state --> paths
127+
poll_log --> paths
128+
```
129+
130+
### Layers
131+
132+
| Layer | Description |
133+
|-------|-------------|
134+
| **action** | GitHub Action entry points for pre/main/post hooks |
135+
| **poller** | Background process that polls /rate_limit and updates state |
136+
| **core** | Pure business logic for rate-limit reduction and state management |
137+
| **infra** | External integrations and platform-specific code |
138+
139+
### Modules
140+
141+
| Module | Layer | Paths | Provided Ports |
142+
|--------|-------|-------|----------------|
143+
| pre | action | src/pre.ts ||
144+
| main | action | src/main.ts ||
145+
| post | action | src/post.ts ||
146+
| start | action | src/start.ts | start.monitor |
147+
| poller | poller | src/poller.ts | poller.spawn, poller.kill |
148+
| rate-limit-control | poller | src/poller/rate-limit-control.ts ||
149+
| poller-entry | poller | src/poller-entry.ts ||
150+
| reducer | core | src/reducer.ts | reducer.update, reducer.initBucket |
151+
| state | core | src/state.ts | state.read, state.write, state.writePid |
152+
| poll-log | core | src/poll-log.ts | pollLog.append, pollLog.read |
153+
| github | infra | src/github.ts | github.fetchRateLimit |
154+
| output | infra | src/output.ts | output.render |
155+
| paths | infra | src/paths.ts | paths.statePath, paths.pidPath, paths.pollLogPath |
156+
| platform | infra | src/platform.ts | platform.isSupported, platform.detect, platform.assertSupported |
157+
158+
---
159+
160+
## Boundary Types
161+
162+
### ReducerState
163+
164+
Global reducer state persisted to state.json.
165+
166+
```typescript
167+
interface ReducerState {
168+
buckets: Record<string, BucketState>;
169+
started_at_ts: string; // ISO timestamp
170+
stopped_at_ts: string | null; // ISO timestamp
171+
poller_started_at_ts: string | null; // ISO timestamp
172+
interval_seconds: number;
173+
poll_count: number;
174+
poll_failures: number;
175+
secondary_rate_limit_hits: number;
176+
last_error: string | null;
177+
}
178+
```
179+
180+
### BucketState
181+
182+
Per-bucket reducer state.
183+
184+
```typescript
185+
interface BucketState {
186+
last_reset: number; // epoch seconds
187+
last_used: number;
188+
total_used: number;
189+
windows_crossed: number;
190+
anomalies: number;
191+
last_seen_ts: string; // ISO timestamp
192+
limit: number;
193+
remaining: number;
194+
first_used: number;
195+
first_remaining: number;
196+
}
197+
```
198+
199+
### RateLimitSample
200+
201+
Single sample from /rate_limit for one bucket.
202+
203+
```typescript
204+
interface RateLimitSample {
205+
limit: number;
206+
used: number;
207+
remaining: number;
208+
reset: number; // epoch seconds
209+
}
210+
```
211+
212+
### RateLimitResponse
213+
214+
Full response from GET /rate_limit.
215+
216+
```typescript
217+
interface RateLimitResponse {
218+
resources: Record<string, RateLimitSample>;
219+
rate: RateLimitSample; // deprecated alias for core
220+
}
221+
```
222+
223+
### SummaryData
224+
225+
Data passed to output renderer for summary generation.
226+
227+
```typescript
228+
interface SummaryData {
229+
state: ReducerState;
230+
duration_seconds: number;
231+
warnings: string[];
232+
}
233+
```
234+
235+
### PollLogEntry
236+
237+
Diagnostic per-poll snapshot written to JSONL and uploaded as a JSON array. `poll_number` counts attempts (success + failures).
238+
239+
```typescript
240+
interface PollLogBucketSnapshot {
241+
used: number;
242+
remaining: number;
243+
reset: number;
244+
limit: number;
245+
delta: number;
246+
window_crossed: boolean;
247+
anomaly: boolean;
248+
}
249+
250+
interface PollLogError {
251+
kind: "primary" | "secondary" | "unknown";
252+
status: number;
253+
message: string | null;
254+
retry_after_seconds: number | null;
255+
rate_limit_remaining: number | null;
256+
rate_limit_reset: number | null;
257+
next_allowed_at: number | null;
258+
secondary_retry_count: number;
259+
}
260+
261+
interface PollLogEntry {
262+
timestamp: string; // ISO
263+
poll_number: number;
264+
buckets: Record<string, PollLogBucketSnapshot>;
265+
error?: PollLogError;
266+
}
267+
```
268+
269+
---
270+
271+
## Milestones
272+
273+
| ID | Name | Description | Exit Criteria |
274+
|----|------|-------------|---------------|
275+
| M1 | Core Logic | Implement reducer, state manager, and GitHub client | Reducer handles deltas, boundaries, anomalies; state atomic write; GitHub client parses response |
276+
| M2 | Action Integration | Implement pre/post hooks and process lifecycle | Pre hook spawns poller; post hook stops poller and renders summary; platform detection; initial poll validates token |
277+
| M3 | Testing | Unit tests and integration tests | Table-driven reducer tests; fixture-based parsing; serialized self-test workflow |
278+
| M4 | Diagnostics | Poll log and artifact support | JSONL poll log when enabled; post hook uploads artifacts when enabled; downstream jobs can download diagnostics |
279+
| M5 | Release | Build, bundle, and release automation | ncc bundle; CI/CD pipeline; action.yml references dist/*.js |
280+
281+
---
282+
283+
## Risks
284+
285+
| ID | Category | Risk | Mitigation |
286+
|----|----------|------|------------|
287+
| R1 | process | Orphan process if post hook does not run on job cancel | Acceptable; runner teardown kills process |
288+
| R2 | process | PID not found or stale at stop time | Handle gracefully; emit warning; proceed with available state |
289+
| R3 | platform | Background process behavior differs across runner types | Scope to GitHub-hosted Linux/macOS |
290+
| R4 | platform | Windows process model differences | Fail-fast with clear message |
291+
| R5 | api | /rate_limit transient failures | Count failures; show warning; no retries for non-rate-limit errors |
292+
| R6 | api | Secondary rate limit if polled too aggressively | Adaptive schedule with debounce floor; 403/429 backoff with retry limits |
293+
| R7 | correctness | Reset boundary between polls | Burst polls near reset to reduce undercount |
294+
| R8 | correctness | Token context changes mid-job | Record anomaly; do not subtract; warn |
295+
| R9 | diagnostics | Artifact upload failure hides diagnostics when enabled | Warn but still render summary; continue best-effort |
296+
297+
---
298+
299+
## File Structure
300+
301+
```
302+
.
303+
├── action.yml
304+
├── package.json
305+
├── tsconfig.json
306+
├── src/
307+
│ ├── pre.ts
308+
│ ├── main.ts
309+
│ ├── post.ts
310+
│ ├── start.ts
311+
│ ├── poller.ts
312+
│ ├── poller-entry.ts
313+
│ ├── github.ts
314+
│ ├── reducer.ts
315+
│ ├── state.ts
316+
│ ├── poll-log.ts
317+
│ ├── output.ts
318+
│ ├── paths.ts
319+
│ └── platform.ts
320+
├── scripts/
321+
│ ├── generate-self-test.ts
322+
│ ├── self-test-manifest.json
323+
│ ├── run-scenario.mjs
324+
│ ├── validate-scenario.mjs
325+
│ ├── check-scenario-enabled.mjs
326+
│ └── render-diagnostics.mjs
327+
├── .github/
328+
│ └── workflows/
329+
│ ├── self-test.yml
330+
│ └── realistic-test.yml
331+
├── spec/
332+
│ ├── spec.json
333+
│ └── spec.v2.json
334+
└── docs/
335+
└── mapping_report.md
336+
```
337+
338+
---
339+
340+
*Generated from spec/spec.v2.json*

0 commit comments

Comments
 (0)