Skip to content

source-archive: skip complete captures + per-run cost report#297

Closed
probably-jaden wants to merge 1 commit into
mainfrom
feat/skip-complete-and-cost
Closed

source-archive: skip complete captures + per-run cost report#297
probably-jaden wants to merge 1 commit into
mainfrom
feat/skip-complete-and-cost

Conversation

@probably-jaden

@probably-jaden probably-jaden commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Two capture-run improvements:

Skip only complete captures. ContentStore.lookup previously returned a cache hit for any prior capture within the TTL, even a partial one (e.g. a failed screenshot encode left screenshot_key=None). It now treats an incomplete capture as a miss — browser captures need html + markdown + screenshot; PDFs (no screenshot) need markdown — so a re-run retries the missing format instead of skipping it forever. Already-complete sites are still skipped.

Per-run cost report. New cost.py estimates a run's spend broken down by backend and per archived site — self-hosted CloakBrowser/Playwright/PDF are free; Hyperbrowser/Firecrawl are priced by the configured proxy mode. capture now prints the breakdown and writes reports/<run_id>_cost.json. (Estimates from public pricing; only successful captures are priced.)

- ContentStore.lookup now treats an incomplete capture (missing html / markdown /
  screenshot; PDFs exempt from screenshot) as a cache miss, so a re-run retries
  the missing format instead of skipping a partial capture forever.
- New cost.py estimates a run's spend per backend and per archived site
  (self-hosted = free; Hyperbrowser/Firecrawl priced by the configured proxy
  mode). The capture CLI prints the breakdown and writes reports/<run_id>_cost.json.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@probably-jaden

Copy link
Copy Markdown
Contributor Author

Split into #298 (skip only complete captures) and #299 (per-run cost report) so each feature is reviewed on its own.

@probably-jaden probably-jaden deleted the feat/skip-complete-and-cost branch June 29, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant