Skip to content

feat(tunnel): pipelined polls with adaptive depth, wseq ordering, STUN blocking#1115

Open
yyoyoian-pixel wants to merge 9 commits into
therealaleph:mainfrom
yyoyoian-pixel:feat/pipeline-tunnel-polls
Open

feat(tunnel): pipelined polls with adaptive depth, wseq ordering, STUN blocking#1115
yyoyoian-pixel wants to merge 9 commits into
therealaleph:mainfrom
yyoyoian-pixel:feat/pipeline-tunnel-polls

Conversation

@yyoyoian-pixel
Copy link
Copy Markdown
Contributor

@yyoyoian-pixel yyoyoian-pixel commented May 13, 2026

Summary

Pipelined full-tunnel with adaptive pipeline depth, write-sequence ordering on the tunnel-node, and WebRTC TCP fallback.

Pipelining

  • Optimist start at depth 2 — every session begins with 2 in-flight polls (free, no elevation permit)
  • Adaptive ramp to depth 4 — sessions with sustained download data (>32KB) elevate with permit
  • Fast-path uploads — when pipeline is full, upload data bypasses depth cap (+4 extra ops)
  • Timer-based refill — non-blocking 100ms steps in the select loop, polls after 1s
  • Single-loop architecture — upload reads and reply processing in one select loop for natural back-pressure

Write ordering (wseq)

  • Client assigns monotonic wseq to data-bearing ops (not polls)
  • Tunnel-node buffers out-of-order writes per session, flushes in wseq order
  • Backward compatible: old clients without wseq write immediately
  • Prevents TLS corruption from pipelined batches completing out of order

STUN/TURN blocking

  • block_stun config (default true) with Android UI toggle
  • Rejects STUN/TURN ports (3478/5349/19302) so WebRTC apps (Meet, WhatsApp) instantly fall back to TCP TURN
  • Eliminates 10-30s ICE negotiation timeout

Tunnel-node improvements

  • LONGPOLL_DEADLINE 4s (must stay below client batch timeout)
  • Reader buffer 2MB (was 64KB)
  • Drain loop: keeps reading until buffer empty (max 1s), accumulates up to 2MB+ per drain
  • Upload size logging

Android

  • Pipeline debug overlay (SYSTEM_ALERT_WINDOW) — temporary, shows session depths and events
  • Tokio worker threads: 4 (was 2)
  • block_stun toggle in Advanced settings

Other

  • Legacy detection removed (was false-triggering)
  • consecutive_empty gate removed from refill (was killing idle sessions)
  • 32KB download threshold for elevation (prevents keep-alive sessions from over-elevating)
  • Unbounded mux channel (prevents upload flood from blocking downloads)

Files changed

  • src/tunnel_client.rs — pipelining, fast-path, wseq, timer refill, single-loop
  • src/domain_fronter.rs — wseq field on BatchOp
  • src/proxy_server.rs — STUN blocking
  • src/config.rs — block_stun config
  • src/android_jni.rs — pipelineDebugJson JNI, worker_threads=4
  • tunnel-node/src/main.rs — wseq ordering, 2MB reader, drain loop, LONGPOLL 4s
  • android/ — ConfigStore, HomeScreen, PipelineDebugOverlay, MhrvVpnService, Native, Manifest

Test plan

  • Pipelining: sessions ramp 2→3→4, downloads overlap
  • Fast-path: uploads bypass full pipeline
  • wseq ordering: tunnel-node logs show in-order writes
  • STUN blocking: Google Meet connects instantly via TCP TURN
  • Video upload: starts immediately, no stall (single-loop)
  • Telegram messaging: messages send with expected delay
  • Debug overlay: shows sessions, depth, events
  • Long-running stability test

🤖 Generated with Claude Code

@github-actions github-actions Bot added the type: feature feat: PR — auto-applied by release-drafter label May 13, 2026
@yyoyoian-pixel yyoyoian-pixel changed the title feat(tunnel): pipelined polls with adaptive depth feat(tunnel): pipelined polls with adaptive depth, wseq ordering, STUN blocking May 14, 2026
@yyoyoian-pixel yyoyoian-pixel force-pushed the feat/pipeline-tunnel-polls branch 2 times, most recently from af703b1 to 5c010ed Compare May 14, 2026 21:58
yyoyoian-pixel and others added 9 commits May 15, 2026 00:01
…nt reads

Three improvements to full-tunnel throughput and latency:

1. **Overlapped client reads**: tunnel_loop reads from the client socket
   concurrently with the batch reply wait via tokio::select!, buffering
   upload data for the next op instead of blocking on a fresh read timeout.

2. **Pipelined polls with seq echo**: add a per-op sequence number echoed
   by the tunnel-node so the client can reorder out-of-order replies.
   Sessions with sustained data flow (consecutive_data >= 2) ramp up to
   MAX_INFLIGHT_PER_SESSION polls in flight, with 1s stagger between sends
   so they land in separate batches. Drops to serial on first empty reply.

3. **Adaptive pipeline depth**: idle sessions stay at depth 1 (no extra
   polls). Data-bearing sessions gradually ramp 1→2→3→...→10. At most
   MAX_ELEVATED_PER_DEPLOYMENT (6) sessions per deployment can be elevated
   simultaneously, preventing semaphore exhaustion. Elevation slots are
   released immediately on first empty reply or session close.

Wire protocol: BatchOp and TunnelResponse gain an optional `seq` field.
Fully backward compatible — old tunnel-nodes ignore the field, new clients
fall back to serial (depth 1) when resp.seq is None.

Tunnel-node: LONGPOLL_DEADLINE reduced from 15s to 4s for faster poll
turnaround while keeping persistent connections (Telegram) stable.

Includes bench-pipeline.sh for comparing serial vs pipelined throughput.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…STUN blocking

Pipeline improvements:
- Optimist start at depth 2 (free, no permit), drop to 1 on 2 consecutive empties
- Elevation permit only for depth 3+ with 32KB download threshold (prevents
  keep-alive sessions like Telegram from over-elevating)
- Fast-path uploads bypass full pipeline with +4 cap and 20ms coalesce
- Data-op preference: 20ms client read check before sending empty polls
- 1s stagger always applied for batch separation
- Client socket close breaks immediately (no waiting for in-flight polls)
- consecutive_data no longer resets on single empties

Android:
- Pipeline debug overlay (SYSTEM_ALERT_WINDOW) with per-session tracking
- Tokio worker threads 4 (was 2) to prevent burst stalls
- STUN/TURN port blocking (3478/5349/19302) for instant WebRTC TCP fallback

Tunnel-node:
- LONGPOLL_DEADLINE 4s (must stay below client batch timeout)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tunnel-node:
- Drain loop: keep reading until buffer empty (max 1s), accumulates
  up to 2MB+ per drain for streaming video (was 100KB)
- Upload size logging for debugging
- 512KB reader buffer (was 64KB)
- LONGPOLL_DEADLINE 4s

Client:
- INFLIGHT_ACTIVE 4 (was 10) to prevent semaphore exhaustion
- Upload loop-read in initial path (1s max, accumulates fat uploads)
- Fast-path 200ms coalesce loop (was single 20ms read)
- 32KB download threshold for elevation (prevents keep-alive sessions
  like Telegram from over-elevating)
- consecutive_data no longer resets on single empties
- block_stun config (default true) with Android UI toggle
- 512KB client read buffer

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Architecture:
- Upload task (spawned): reads client socket → sends MuxMsg::Data with
  wseq directly to mux → sends InflightEntry to download task. Fully
  independent, never blocked by downloads.
- Download task (inline): processes replies, sends refill polls (timer),
  accepts InflightEntry. Never blocked by uploads.
- Lock-free mpsc channels throughout — no Mutex contention.

Write ordering (wseq):
- Client assigns monotonic wseq to data-bearing ops only (not polls).
- Tunnel-node buffers out-of-order writes per session, flushes in wseq
  order. Backward compatible: old clients without wseq write immediately.
- Fixes data corruption from pipelined batches completing out of order.

Upload accumulation:
- Adaptive: 50ms initial window for small messages (low latency).
- If >= 32KB accumulated, extend to 1s / 1MB cap (fat uploads for files).

Other:
- Removed consecutive_empty gate on refill (was killing idle sessions).
- Tunnel-node reader buffer 2MB (was 512KB).
- Removed legacy detection (was false-triggering on merged replies).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduces upload chunk size to prevent large video uploads from starving
heartbeat polls in shared batches. Adaptive accumulation:
- 50ms initial window, 10ms per-read gap timeout
- >= 8KB triggers extended 1s window (capped at 256KB)
- Smaller chunks clear batches faster, heartbeats get through

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Mux channel unbounded (was 512) — prevents upload flood from blocking
  download task's poll sends and ack processing
- Pipeline debug functions no-op'd — std::sync::Mutex was blocking tokio
  workers under contention during heavy uploads
- Upload accumulation yields between reads
- Added batch response mismatch logging (r.len vs sent ops)
- Open issue: r.len()=0 from Apps Script during heavy uploads

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Upload semaphore: max 3 unacked data ops per session (TCP-like flow
  control). Permit held in inflight future until reply arrives.
- Suppress refill polls while data ops are in flight — prevents upload
  acks from being delayed behind slow poll responses in pending_writes.
- data_ops_in_flight counter tracks active upload ops per session.
- upload_cap config field (default 3, not yet wired to Android UI).

Root cause of video upload stall: r.len()=0 batch responses from Apps
Script when batches are large (19+ ops). Needs Apps Script investigation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The split upload/download task architecture caused video upload stalls:
upload ack responses were delayed behind slow poll responses in the
pending_writes ordering buffer. The single-loop naturally serializes
uploads with reply processing, giving steady ack delivery.

Single-loop keeps all pipelining benefits (elevated polls, adaptive
depth, fast-path uploads) while avoiding the ordering issue.

Removed dead upload_cap config field.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@yyoyoian-pixel yyoyoian-pixel force-pushed the feat/pipeline-tunnel-polls branch from 5c010ed to 377add3 Compare May 14, 2026 22:01
@yyoyoian-pixel yyoyoian-pixel marked this pull request as ready for review May 15, 2026 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: feature feat: PR — auto-applied by release-drafter

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant