Skip to content

perf(start-client-core): O(1) buffer drain in client frame decoder#2

Closed
anonrig wants to merge 1 commit into
mainfrom
perf/frame-decoder-index-pointer
Closed

perf(start-client-core): O(1) buffer drain in client frame decoder#2
anonrig wants to merge 1 commit into
mainfrom
perf/frame-decoder-index-pointer

Conversation

@anonrig

@anonrig anonrig commented Jun 21, 2026

Copy link
Copy Markdown
Owner

What

Replace the O(n) bufferList.shift() in the client-side frame decoder (packages/start-client-core/src/client-rpc/frame-decoder.ts) with an O(1) head pointer.

extractFlattened() dropped each fully-consumed chunk from the front of bufferList with shift(). When a single large frame (e.g. a big RawStream payload) is assembled from many small network reads, the extract loop calls shift() once per chunk — and each shift() re-indexes the whole array, so reassembly degrades to O(n²).

This PR tracks the first un-consumed chunk with a bufferHead index and advances it in O(1) instead of shifting. readHeader() reads from bufferHead as well. Consumed slots are released for GC, and the array is compacted:

  • fully drained (bufferHead === bufferList.length) → reset in O(1) (the common terminal state), or
  • once the consumed prefix grows past a small threshold → splice() it off (amortized O(1) per consumed chunk).

This mirrors the existing index-pointer approach already used in transformStreamWithRouter for the same reason.

Why

Same hot path as the sibling zero-copy PR: decoding streamed server-function responses and RawStream payloads. The O(n²) bites specifically when one frame spans many buffered chunks.

Standalone micro-benchmark (Node, median of 12 runs), draining N buffered chunks:

Chunks shift() head pointer Speedup
200 26.7 ms 2.5 ms 10.6x
1000 173 ms 15 ms 11.5x

Tests

  • Existing frame-decoder suite passes.
  • Added two tests for the changed paths:
    • a 200-byte CHUNK payload delivered one byte at a time (forces the header slow path + many whole-chunk consumptions + the fully-drained reset),
    • 100-byte frames fed in 7-byte reads that never align with frame boundaries, so the head pointer climbs past the compaction threshold repeatedly (exercises the splice() prefix drop).
test:unit   ✓ 20 passed
test:types  ✓ no errors
eslint      ✓ frame-decoder.ts clean (no new problems vs main)

Notes

  • Independent of the sibling zero-copy PR. Both touch extractFlattened, so whichever merges second will need a trivial rebase.
  • Pre-existing import/order lint errors in src/client/hydrateStart.ts (virtual-module imports) are unrelated to this change and present on main.

The frame decoder dropped consumed chunks from its buffer with
bufferList.shift(), which is O(n). When a single large frame (e.g. a big
RawStream payload) is assembled from many small network reads, the
extract loop calls shift() once per chunk, making reassembly O(n^2).

Track the first un-consumed chunk with a head pointer and advance it in
O(1) instead of shifting. Consumed slots are released for GC, and the
buffer is compacted when fully drained (O(1) reset) or once the consumed
prefix grows past a small threshold (amortized O(1) per chunk).

A micro-benchmark draining 1000 small chunks is ~11x faster.
@anonrig

anonrig commented Jun 21, 2026

Copy link
Copy Markdown
Owner Author

Superseded by upstream PR TanStack#7663 (opened against TanStack/router).

@anonrig anonrig closed this Jun 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant