fix: reconnect Discord gateway on silent WS disconnect#791
fix: reconnect Discord gateway on silent WS disconnect#791chaodu-agent wants to merge 3 commits into
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
LGTM ✅ — Solid reconnect loop with correct backoff semantics and clean shutdown handling.
Findings
| # | Severity | Finding | Status |
|---|---|---|---|
| 1 | 💭 nit | warn log "1s" hardcoded — should use structured field | ✅ Fixed in cc99649 |
| 2 | 💭 nit | process::exit(1) skips Drop/cleanup |
Noted — intentional, added rationale comment in cc99649 |
Why not replace process::exit(1) with return Err(...)?
These paths only trigger on fatal config errors (bad token / missing intents) — the bot never successfully connected, so there are no active sessions, no in-flight tasks, and no dispatcher state to drain. return Err(...) would run the cleanup block (wait 5s for Slack, 35s for cron, drain pool…) for no reason.
Good
shutdown_task.abort()prevents listener accumulation across iterations- Backoff escalates only on errors; clean disconnect resets to 1s
- Handler rebuilt each iteration — thread caches correctly reset on READY
tokio::select!on every sleep point ensures no zombie loop on shutdown
No blocking issues. Recommend merge.
When serenity's client.start() returns (either Ok or transient error), the Discord adapter now automatically reconnects with exponential backoff instead of silently dying. - Wrap client build + start in a retry loop - Fatal errors (bad token, bad intents) still exit immediately - Transient errors use exponential backoff (1s → 60s max) - Successful sessions reset backoff to 1s - Graceful shutdown via shutdown_rx breaks the loop - Log reconnect attempts at WARN level for observability Fixes #790
…mulation, F3 backoff logic) - F1 (🔴): Wrap Client::builder().await in match to retry on transient build failures instead of crashing main with ? - F2 (🟡): Abort shutdown listener task after client.start() returns to prevent task accumulation across reconnect iterations - F3 (🟡): Move backoff escalation into Err arm only; Ok path resets to 1s and does not escalate
…s::exit rationale
cc99649 to
554180c
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
LGTM ✅ — Solid reconnect loop with proper backoff and graceful shutdown integration. What This PR DoesFixes #790 — when serenity's How It WorksWraps the Discord client lifecycle in a
Findings
Baseline Check
What's Good (🟢)
|
Summary
Fixes #790 — Discord gateway silently dies after WS disconnect with no reconnect.
Problem
When serenity's
client.start()returnsOk(())(internal reconnect exhausted), the Discord adapter permanently stops receiving events while the container remains "healthy".Changes
Wraps the Discord client lifecycle in a reconnect loop with exponential backoff:
client.start()returningOk(())or transient errors triggers a reconnect attemptDisallowedGatewayIntentsandInvalidAuthenticationstill callprocess::exit(1)shutdown_rxsignal (SIGINT/SIGTERM)How it works
Handler is rebuilt each iteration — all shared state (
router,dispatcher) isArc-wrapped so cloning is cheap. Thread-local caches (participated_threads,multibot_threads) are fresh per reconnect, which is correct since Discord will re-dispatch theREADYevent.Testing
Not included (future work)
https://discord.com/channels/1491295327620169908/1491365157010542652/1503355477612957696