|
| 1 | +# github-api-usage-monitor v2 - Specification |
| 2 | + |
| 3 | +> **This document is derived from `spec/spec.v2.json`.** Do not edit directly; regenerate from the authoritative spec. |
| 4 | +
|
| 5 | +**Status:** implemented |
| 6 | +**Canonical Date:** 2026-02-02 |
| 7 | +**Spec Version:** 2.0 |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Summary |
| 12 | + |
| 13 | +A GitHub Action that monitors API rate-limit usage during a job using pre/post hooks. A detached poller performs adaptive /rate_limit sampling, then the post hook renders a summary and can upload diagnostics artifacts (state.json + poll-log.json) when enabled. |
| 14 | + |
| 15 | +--- |
| 16 | + |
| 17 | +## Scope Boundary |
| 18 | + |
| 19 | +### In Scope |
| 20 | + |
| 21 | +- Bucket-level accounting of primary rate-limit usage during a job (hour- and minute-buckets) |
| 22 | +- Adaptive polling that targets reset boundaries with a debounce floor |
| 23 | +- Single-step user experience via pre/post hooks (no explicit start/stop steps) |
| 24 | +- Clear, actionable output: per-bucket totals, warnings, remaining quota, next reset time |
| 25 | +- Optional diagnostics artifacts uploaded on post (state.json and poll-log.json) when enabled |
| 26 | +- Linux/macOS GitHub-hosted runners |
| 27 | + |
| 28 | +### Out of Scope |
| 29 | + |
| 30 | +- Per-request endpoint tracing (URL/method) |
| 31 | +- Per-step attribution |
| 32 | +- Advanced secondary rate-limit diagnostics (burstiness, concurrency, abuse heuristics) |
| 33 | +- Strong guarantees on job cancellation/runner crash (partial results acceptable) |
| 34 | +- Windows support |
| 35 | +- Job-level container support |
| 36 | +- Self-hosted runner and GHES support |
| 37 | + |
| 38 | +--- |
| 39 | + |
| 40 | +## Steel Thread Acceptance Criteria |
| 41 | + |
| 42 | +1. Pre hook spawns a background poller that persists across workflow steps |
| 43 | +2. Poller uses adaptive scheduling (base interval 30s, debounce floor) and updates reducer state |
| 44 | +3. Poller respects 403/429 rate-limit responses with backoff and retry limits |
| 45 | +4. Post hook terminates poller and produces summary in step summary + console |
| 46 | +5. Post hook uploads diagnostics artifacts containing state.json and poll-log.json when diagnostics are enabled |
| 47 | +6. Warnings are emitted for poll failures, anomalies, secondary rate limits, and unsupported environments |
| 48 | +7. Token is never printed to logs |
| 49 | + |
| 50 | +--- |
| 51 | + |
| 52 | +## Functional Requirements |
| 53 | + |
| 54 | +| ID | Priority | Requirement | Notes | |
| 55 | +|----|----------|-------------|-------| |
| 56 | +| F1 | must | Pre hook validates platform and token, creates baseline state, and spawns detached poller | Initial /rate_limit poll establishes baseline and fails fast on invalid token | |
| 57 | +| F2 | must | Post hook terminates poller and prints summary even if poller not running | Best-effort output; run with post-if: always() | |
| 58 | +| F3 | must | Accept token input; default to github.token / GITHUB_TOKEN if present | Never print token; use GitHub masking | |
| 59 | +| F4 | must | Poll /rate_limit using adaptive scheduling with a 30s base interval and a debounce floor | Targets pre-reset windows; burst polling near reset boundaries | |
| 60 | +| F5 | must | Persist state and PID to $RUNNER_TEMP for cross-step access | state.json in $RUNNER_TEMP/github-api-usage-monitor; PID at poller.pid | |
| 61 | +| F6 | must | Track all rate-limit buckets returned by /rate_limit API | Report all buckets with usage in summary | |
| 62 | +| F7 | must | Handle reset boundaries by including used count immediately after reset change | Minimizes undercount; post-reset used reflects consumption since new window | |
| 63 | +| F8 | must | Detect anomalies when used decreases without reset change | Increment anomaly counter; do not subtract from totals; emit warning | |
| 64 | +| F9 | should | Periodically write state file for durability during long-running polls | Best-effort data preservation on unexpected termination | |
| 65 | +| F10 | should | Output summary to GitHub step summary and console | Table sorted by total_used desc; one-line console summary + top buckets | |
| 66 | +| F11 | should | When diagnostics are enabled, write per-poll JSONL diagnostics log to poll-log.jsonl | Best-effort; do not disrupt poller on write errors | |
| 67 | +| F12 | must | When diagnostics are enabled, upload diagnostics artifacts in post hook | state.json and poll-log.json uploaded as a single artifact | |
| 68 | +| F13 | should | Allow override of artifact name | Optional input for matrix jobs to avoid collisions | |
| 69 | +| F14 | must | Allow diagnostics to be enabled or disabled via input | Default off; when disabled, skip poll log and artifact upload | |
| 70 | +| F15 | must | On 403/429 responses, back off per primary/secondary rules and stop after 5 consecutive secondary retries | Primary: remaining=0 -> wait until x-ratelimit-reset. Secondary: message contains secondary/abuse; honor retry-after and reset; default 60s; exponential backoff. | |
| 71 | + |
| 72 | +--- |
| 73 | + |
| 74 | +## Non-Functional Requirements |
| 75 | + |
| 76 | +| ID | Category | Requirement | Measurement | |
| 77 | +|----|----------|-------------|-------------| |
| 78 | +| NF1 | security | No secrets in logs | Token never appears in stdout/stderr; no set -x; use GitHub masking | |
| 79 | +| NF2 | reliability | Poller process survives step boundaries on Linux/macOS GitHub-hosted runners | Detached process with unref(); PID-based lifecycle management | |
| 80 | +| NF3 | performance | Constant-space reducer with O(#buckets) per poll | State file remains small; minimal log volume | |
| 81 | +| NF4 | maintainability | Deterministic reducer behavior with unit-testable pure functions | Table-driven tests for all reducer edge cases | |
| 82 | +| NF5 | operational_safety | Poller has a maximum lifetime to prevent runaway processes | Exits after MAX_LIFETIME_MS with state write | |
| 83 | + |
| 84 | +--- |
| 85 | + |
| 86 | +## Architecture |
| 87 | + |
| 88 | +### Layer Diagram |
| 89 | + |
| 90 | +```mermaid |
| 91 | +flowchart TB |
| 92 | + subgraph action["Action Entry Layer"] |
| 93 | + pre["pre.ts"] |
| 94 | + main["main.ts"] |
| 95 | + post["post.ts"] |
| 96 | + start["start.ts"] |
| 97 | + end |
| 98 | +
|
| 99 | + subgraph poller_layer["Poller Layer"] |
| 100 | + poller["poller.ts"] |
| 101 | + poller_entry["poller-entry.ts"] |
| 102 | + end |
| 103 | +
|
| 104 | + subgraph core["Core Logic Layer"] |
| 105 | + reducer["reducer.ts"] |
| 106 | + state["state.ts"] |
| 107 | + poll_log["poll-log.ts"] |
| 108 | + end |
| 109 | +
|
| 110 | + subgraph infra["Infrastructure Layer"] |
| 111 | + github["github.ts"] |
| 112 | + output["output.ts"] |
| 113 | + paths["paths.ts"] |
| 114 | + platform["platform.ts"] |
| 115 | + end |
| 116 | +
|
| 117 | + pre --> start |
| 118 | + start --> poller |
| 119 | + post --> poller |
| 120 | + post --> output |
| 121 | + poller_entry --> poller |
| 122 | + poller --> github |
| 123 | + poller --> reducer |
| 124 | + poller --> state |
| 125 | + poller --> poll_log |
| 126 | + state --> paths |
| 127 | + poll_log --> paths |
| 128 | +``` |
| 129 | + |
| 130 | +### Layers |
| 131 | + |
| 132 | +| Layer | Description | |
| 133 | +|-------|-------------| |
| 134 | +| **action** | GitHub Action entry points for pre/main/post hooks | |
| 135 | +| **poller** | Background process that polls /rate_limit and updates state | |
| 136 | +| **core** | Pure business logic for rate-limit reduction and state management | |
| 137 | +| **infra** | External integrations and platform-specific code | |
| 138 | + |
| 139 | +### Modules |
| 140 | + |
| 141 | +| Module | Layer | Paths | Provided Ports | |
| 142 | +|--------|-------|-------|----------------| |
| 143 | +| pre | action | src/pre.ts | — | |
| 144 | +| main | action | src/main.ts | — | |
| 145 | +| post | action | src/post.ts | — | |
| 146 | +| start | action | src/start.ts | start.monitor | |
| 147 | +| poller | poller | src/poller.ts | poller.spawn, poller.kill | |
| 148 | +| rate-limit-control | poller | src/poller/rate-limit-control.ts | — | |
| 149 | +| poller-entry | poller | src/poller-entry.ts | — | |
| 150 | +| reducer | core | src/reducer.ts | reducer.update, reducer.initBucket | |
| 151 | +| state | core | src/state.ts | state.read, state.write, state.writePid | |
| 152 | +| poll-log | core | src/poll-log.ts | pollLog.append, pollLog.read | |
| 153 | +| github | infra | src/github.ts | github.fetchRateLimit | |
| 154 | +| output | infra | src/output.ts | output.render | |
| 155 | +| paths | infra | src/paths.ts | paths.statePath, paths.pidPath, paths.pollLogPath | |
| 156 | +| platform | infra | src/platform.ts | platform.isSupported, platform.detect, platform.assertSupported | |
| 157 | + |
| 158 | +--- |
| 159 | + |
| 160 | +## Boundary Types |
| 161 | + |
| 162 | +### ReducerState |
| 163 | + |
| 164 | +Global reducer state persisted to state.json. |
| 165 | + |
| 166 | +```typescript |
| 167 | +interface ReducerState { |
| 168 | + buckets: Record<string, BucketState>; |
| 169 | + started_at_ts: string; // ISO timestamp |
| 170 | + stopped_at_ts: string | null; // ISO timestamp |
| 171 | + poller_started_at_ts: string | null; // ISO timestamp |
| 172 | + interval_seconds: number; |
| 173 | + poll_count: number; |
| 174 | + poll_failures: number; |
| 175 | + secondary_rate_limit_hits: number; |
| 176 | + last_error: string | null; |
| 177 | +} |
| 178 | +``` |
| 179 | + |
| 180 | +### BucketState |
| 181 | + |
| 182 | +Per-bucket reducer state. |
| 183 | + |
| 184 | +```typescript |
| 185 | +interface BucketState { |
| 186 | + last_reset: number; // epoch seconds |
| 187 | + last_used: number; |
| 188 | + total_used: number; |
| 189 | + windows_crossed: number; |
| 190 | + anomalies: number; |
| 191 | + last_seen_ts: string; // ISO timestamp |
| 192 | + limit: number; |
| 193 | + remaining: number; |
| 194 | + first_used: number; |
| 195 | + first_remaining: number; |
| 196 | +} |
| 197 | +``` |
| 198 | + |
| 199 | +### RateLimitSample |
| 200 | + |
| 201 | +Single sample from /rate_limit for one bucket. |
| 202 | + |
| 203 | +```typescript |
| 204 | +interface RateLimitSample { |
| 205 | + limit: number; |
| 206 | + used: number; |
| 207 | + remaining: number; |
| 208 | + reset: number; // epoch seconds |
| 209 | +} |
| 210 | +``` |
| 211 | + |
| 212 | +### RateLimitResponse |
| 213 | + |
| 214 | +Full response from GET /rate_limit. |
| 215 | + |
| 216 | +```typescript |
| 217 | +interface RateLimitResponse { |
| 218 | + resources: Record<string, RateLimitSample>; |
| 219 | + rate: RateLimitSample; // deprecated alias for core |
| 220 | +} |
| 221 | +``` |
| 222 | + |
| 223 | +### SummaryData |
| 224 | + |
| 225 | +Data passed to output renderer for summary generation. |
| 226 | + |
| 227 | +```typescript |
| 228 | +interface SummaryData { |
| 229 | + state: ReducerState; |
| 230 | + duration_seconds: number; |
| 231 | + warnings: string[]; |
| 232 | +} |
| 233 | +``` |
| 234 | + |
| 235 | +### PollLogEntry |
| 236 | + |
| 237 | +Diagnostic per-poll snapshot written to JSONL and uploaded as a JSON array. `poll_number` counts attempts (success + failures). |
| 238 | + |
| 239 | +```typescript |
| 240 | +interface PollLogBucketSnapshot { |
| 241 | + used: number; |
| 242 | + remaining: number; |
| 243 | + reset: number; |
| 244 | + limit: number; |
| 245 | + delta: number; |
| 246 | + window_crossed: boolean; |
| 247 | + anomaly: boolean; |
| 248 | +} |
| 249 | + |
| 250 | +interface PollLogError { |
| 251 | + kind: "primary" | "secondary" | "unknown"; |
| 252 | + status: number; |
| 253 | + message: string | null; |
| 254 | + retry_after_seconds: number | null; |
| 255 | + rate_limit_remaining: number | null; |
| 256 | + rate_limit_reset: number | null; |
| 257 | + next_allowed_at: number | null; |
| 258 | + secondary_retry_count: number; |
| 259 | +} |
| 260 | + |
| 261 | +interface PollLogEntry { |
| 262 | + timestamp: string; // ISO |
| 263 | + poll_number: number; |
| 264 | + buckets: Record<string, PollLogBucketSnapshot>; |
| 265 | + error?: PollLogError; |
| 266 | +} |
| 267 | +``` |
| 268 | + |
| 269 | +--- |
| 270 | + |
| 271 | +## Milestones |
| 272 | + |
| 273 | +| ID | Name | Description | Exit Criteria | |
| 274 | +|----|------|-------------|---------------| |
| 275 | +| M1 | Core Logic | Implement reducer, state manager, and GitHub client | Reducer handles deltas, boundaries, anomalies; state atomic write; GitHub client parses response | |
| 276 | +| M2 | Action Integration | Implement pre/post hooks and process lifecycle | Pre hook spawns poller; post hook stops poller and renders summary; platform detection; initial poll validates token | |
| 277 | +| M3 | Testing | Unit tests and integration tests | Table-driven reducer tests; fixture-based parsing; serialized self-test workflow | |
| 278 | +| M4 | Diagnostics | Poll log and artifact support | JSONL poll log when enabled; post hook uploads artifacts when enabled; downstream jobs can download diagnostics | |
| 279 | +| M5 | Release | Build, bundle, and release automation | ncc bundle; CI/CD pipeline; action.yml references dist/*.js | |
| 280 | + |
| 281 | +--- |
| 282 | + |
| 283 | +## Risks |
| 284 | + |
| 285 | +| ID | Category | Risk | Mitigation | |
| 286 | +|----|----------|------|------------| |
| 287 | +| R1 | process | Orphan process if post hook does not run on job cancel | Acceptable; runner teardown kills process | |
| 288 | +| R2 | process | PID not found or stale at stop time | Handle gracefully; emit warning; proceed with available state | |
| 289 | +| R3 | platform | Background process behavior differs across runner types | Scope to GitHub-hosted Linux/macOS | |
| 290 | +| R4 | platform | Windows process model differences | Fail-fast with clear message | |
| 291 | +| R5 | api | /rate_limit transient failures | Count failures; show warning; no retries for non-rate-limit errors | |
| 292 | +| R6 | api | Secondary rate limit if polled too aggressively | Adaptive schedule with debounce floor; 403/429 backoff with retry limits | |
| 293 | +| R7 | correctness | Reset boundary between polls | Burst polls near reset to reduce undercount | |
| 294 | +| R8 | correctness | Token context changes mid-job | Record anomaly; do not subtract; warn | |
| 295 | +| R9 | diagnostics | Artifact upload failure hides diagnostics when enabled | Warn but still render summary; continue best-effort | |
| 296 | + |
| 297 | +--- |
| 298 | + |
| 299 | +## File Structure |
| 300 | + |
| 301 | +``` |
| 302 | +. |
| 303 | +├── action.yml |
| 304 | +├── package.json |
| 305 | +├── tsconfig.json |
| 306 | +├── src/ |
| 307 | +│ ├── pre.ts |
| 308 | +│ ├── main.ts |
| 309 | +│ ├── post.ts |
| 310 | +│ ├── start.ts |
| 311 | +│ ├── poller.ts |
| 312 | +│ ├── poller-entry.ts |
| 313 | +│ ├── github.ts |
| 314 | +│ ├── reducer.ts |
| 315 | +│ ├── state.ts |
| 316 | +│ ├── poll-log.ts |
| 317 | +│ ├── output.ts |
| 318 | +│ ├── paths.ts |
| 319 | +│ └── platform.ts |
| 320 | +├── scripts/ |
| 321 | +│ ├── generate-self-test.ts |
| 322 | +│ ├── self-test-manifest.json |
| 323 | +│ ├── run-scenario.mjs |
| 324 | +│ ├── validate-scenario.mjs |
| 325 | +│ ├── check-scenario-enabled.mjs |
| 326 | +│ └── render-diagnostics.mjs |
| 327 | +├── .github/ |
| 328 | +│ └── workflows/ |
| 329 | +│ ├── self-test.yml |
| 330 | +│ └── realistic-test.yml |
| 331 | +├── spec/ |
| 332 | +│ ├── spec.json |
| 333 | +│ └── spec.v2.json |
| 334 | +└── docs/ |
| 335 | + └── mapping_report.md |
| 336 | +``` |
| 337 | + |
| 338 | +--- |
| 339 | + |
| 340 | +*Generated from spec/spec.v2.json* |
0 commit comments