fix: add IDB healing mechanism for backing store corruption and connection lost errors#780
Conversation
bd7a14f to
32a6cc8
Compare
| * https://github.com/Expensify/App/issues/87862 | ||
| */ | ||
| function isBackingStoreError(error: unknown): boolean { | ||
| return error instanceof DOMException && error.name === 'UnknownError' && error.message.includes('Internal error opening backing store'); |
There was a problem hiding this comment.
| return error instanceof DOMException && error.name === 'UnknownError' && error.message.includes('Internal error opening backing store'); | |
| return error instanceof Error && error.message.includes('Internal error opening backing store'); |
I think we can simplify to this
| * Detects the Chromium-specific IDB backing store corruption error. | ||
| * Fires when LevelDB files backing IndexedDB are corrupted and Chrome's | ||
| * internal recovery (RepairDB -> delete -> recreate) also fails. | ||
| * https://github.com/Expensify/App/issues/87862 |
There was a problem hiding this comment.
| * https://github.com/Expensify/App/issues/87862 |
I dont think its needed to link issue here
| executeTransaction(txMode, callback) | ||
| .then(resetHealBudget) | ||
| .catch((error) => { | ||
| if (error instanceof DOMException && error.name === 'InvalidStateError') { |
There was a problem hiding this comment.
| if (error instanceof DOMException && error.name === 'InvalidStateError') { | |
| if (error instanceof Error && error.name === 'InvalidStateError') { |
Same
|
3a900d8 to
beb75d5
Compare
|
|
@leshniak let's handle |
beb75d5 to
d1f95a0
Compare
- Probe start: logInfo when tab becomes visible and probe begins - Probe healthy: logInfo confirming connection is healthy - Probe stale: logAlert with error details when stale connection detected - Heal attempts/success/exhaustion/non-recoverable: same as Expensify#780 - Updated test assertions to match new log messages and levels Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fabioh8010
left a comment
There was a problem hiding this comment.
- I checked the E/App PR and description looks broken – also we are still missing videos/evidence there
- Please add the same videos/evidence to this PR description in order for CI to pass.
| function isConnectionLostError(error: unknown): boolean { | ||
| if (!(error instanceof Error)) return false; | ||
| const msg = error.message.toLowerCase(); | ||
| return msg.includes('connection to indexed database server lost') || msg.includes('connection is closing'); |
There was a problem hiding this comment.
Using msg.includes('connection is closing') means we will include other connection closing errors too, e.g.:
- InvalidStateError: Failed to execute 'transaction' on 'IDBDatabase': The database connection is closing. (initially handled in fix the database connection is closing issue. #748)
- UnknownError: Connection is closing because of: IO error:
- UnknownError: Connection is closing because of: Failed to remove blob file.
- UnknownError: Connection is closing.
- UnknownError: Connection is closing because of: Force close delete origin
- UnknownError: Connection is closing because of: Corruption: block checksum mismatch
It's intended, right?
There was a problem hiding this comment.
yes, it's intended - for all unexpected errors from this family
|
Reassure failure is flakiness @leshniak Could you merge this branch with @Julesssss could you review this PR too? Thanks! |
Adds a Dexie-style heal pattern to createStore for Chromium's Internal error opening backing store error (884K errors/month). - isBackingStoreError() detects the Chromium-specific corruption - Shared healAttemptsRemaining counter (3, reset on success) - On backing store error: clear cached connection, retry once - Clear dbp on rejection so retries get fresh indexedDB.open() - 5 new tests: mid-session heal, init heal, budget exhaustion, budget reset, error classification No deleteDatabase(), no provider swap, no UI changes. Scoped to IDBKeyValProvider only -- SQLite provider untouched. Ref: Expensify/App#90636 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Capture dbp reference before attaching reject handler; only clear if dbp hasn't been replaced by a concurrent heal/retry (prevents stale rejection handler from clearing a newer promise) - Add comment documenting concurrent store() budget drain behavior - Fix test formatting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ose) The heal path clears the cached dbp and reopens via indexedDB.open(), but does not call db.close() on the old IDBDatabase. Updated comments and log messages from 'close + reopen' to 'drop cached connection and reopen' to match what the code actually does. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- isBackingStoreError: use Error instead of DOMException, drop .name check - InvalidStateError catch: same simplification - Remove issue link from JSDoc Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add isConnectionLostError() to detect 'Connection to Indexed Database server lost' and 'Connection is closing' — Safari/WebKit errors that fire when the browser terminates IDB connections for backgrounded tabs. Uses the same heal-and-retry mechanism as backing store corruption: drop cached dbp, retry once with fresh indexedDB.open(), shared budget. Addresses Expensify/App#87864. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extract module-level helpers: isInvalidStateError, isBudgetedHealError, getBudgetedHealErrorLabel. Extract cacheOpenPromise to deduplicate rejected-promise cleanup in getDB and verifyStoreExists. Pure refactor — no behavior change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Heal attempts: logAlert with error type, action taken, remaining budget - Heal success: logInfo confirming recovery after each error type - Budget exhaustion: explicit logAlert when heal budget drains - Non-recoverable errors: logAlert with error details - Updated test assertions to match new log messages and levels Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only budget exhaustion and non-recoverable errors should trigger alerts. Heal attempts are handled gracefully and only need informational logging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9240599 to
3fbb657
Compare
- Probe start: logInfo when tab becomes visible and probe begins - Probe healthy: logInfo confirming connection is healthy - Probe stale: logAlert with error details when stale connection detected - Heal attempts/success/exhaustion/non-recoverable: same as Expensify#780 - Updated test assertions to match new log messages and levels Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
| if (isBudgetedHealError(error) && healAttemptsRemaining > 0) { | ||
| healAttemptsRemaining--; | ||
| const label = getBudgetedHealErrorLabel(error); |
There was a problem hiding this comment.
Is there a reason not to keep the counter local and per-transaction? It seems we'll have one variable for the whole app?
| function isBackingStoreError(error: unknown): boolean { | ||
| return error instanceof Error && error.message.includes('Internal error opening backing store'); | ||
| } |
There was a problem hiding this comment.
🟢 Code Quality: Good
The error detection functions are pure, focused, and readable.
🟡 Issue - Fragile string matching: The heal logic relies on exact error message substrings. Consider adding a fallback or a more robust pattern.
// Current (brittle):
return error instanceof Error && error.message.includes('Internal error opening backing store');
// Suggested: slightly more resilient, though still message-based
const BACKING_STORE_PATTERNS = ['internal error opening backing store', 'backing store'];
// ...
return BACKING_STORE_PATTERNS.some(p => msg.includes(p));Why it matters: If a browser vendor changes the error wording slightly (which happens), the 884K/month errors will continue to crash the app despite having heal logic that should catch them.
There was a problem hiding this comment.
This can end up include other class of errors like QuotaExceededError: Encountered full disk while opening backing store for indexedDB.open..
I agree that Chrome could change the errors some day but we can't really protect ourselves against this, it could change to something completely different from backing store and the suggestion would still fail.
| function getBudgetedHealErrorLabel(error: unknown): 'backing store' | 'connection lost' { | ||
| return isBackingStoreError(error) ? 'backing store' : 'connection lost'; | ||
| } |
There was a problem hiding this comment.
🟡 Issue: getBudgetedHealErrorLabel is fragile to future categories
Similar to here, this assumes that if it's not a backing store error, it MUST be a connection lost error. If a third budgeted heal error category is ever added to isBudgetedHealError, this will silently mislabel it as 'connection lost'.
Suggested fix:
function getBudgetedHealErrorLabel(error: unknown): 'backing store' | 'connection lost' | null {
if (isBackingStoreError(error)) return 'backing store';
if (isConnectionLostError(error)) return 'connection lost';
return null;
} And update the caller:
const label = getBudgetedHealErrorLabel(error);
if (!label) {
throw new Error('Unexpected: isBudgetedHealError matched but no label found');
}There was a problem hiding this comment.
Agree but we shouldn't throw an Error just because the label wasn't found, let's just log something and/or skip the logic if necessary
| function isInvalidStateError(error: unknown): boolean { | ||
| return error instanceof Error && error.name === 'InvalidStateError'; | ||
| } |
There was a problem hiding this comment.
🟡 Potential edge case: InvalidStateError type checking relaxed
The old code checked:
if (error instanceof DOMException && error.name === 'InvalidStateError') {The new code uses:
function isInvalidStateError(error: unknown): boolean {
return error instanceof Error && error.name === 'InvalidStateError';
}This is more permissive (DOMException is not strictly instanceof Error in all envs, but Error is broader). In practice this is unlikely to cause issues since DOMException typically inherits from Error. But if an env throws an InvalidStateError that is not an Error, this check would miss it.
Suggested fix - tighten the check:
function isInvalidStateError(error: unknown): boolean {
return (error instanceof Error || error instanceof DOMException) && (error as Error).name === 'InvalidStateError';
}
ikevin127
left a comment
There was a problem hiding this comment.
🟢 LGTM
☝️ Only had 3 comments posted above that might be of interest before merging.
🧪 Test Coverage: Excellent
The new tests cover all major branches:
- Backing store error → heal mid-session
- Backing store error → heal on init
- Budget exhaustion across multiple operations
- Budget reset after success
- Non-backing-store errors → no heal
- Connection lost error → heal
- Connection lost budget exhaustion
- "Connection is closing" variant
- Shared budget between error types
Test coverage estimate for createStore.ts new logic: ~95%+. All branches are exercised. The one very minor gap: getBudgetedHealErrorLabel returning 'connection lost' for the generic case is not directly unit-tested because the tests always use specific error types.
Details
Adds an IDB healing mechanism in
createStore.ts— the IDB connection manager forIDBKeyValProvider— addressing two sibling storage error classes:UnknownError: Internal error opening backing store for indexedDB.open.— 884K errors/month, 26.3% of all storage errors, Chrome/Edge only (investigation, solution design)UnknownError: Connection to Indexed Database server lost.— 637K errors/month, 19% of all storage errors, 100% WebKit (investigation, solution)Both errors share the same structural problem: a stale cached
dbpis never cleared, so all retries reuse the dead/corrupt connection. This is a Dexie-style heal pattern (PR1398_maxLoop).What it does:
isBackingStoreError()— detects Chromium LevelDB corruption ('Internal error opening backing store')isConnectionLostError()— detects Safari/WebKit connection termination ('connection to indexed database server lost','connection is closing'). WebKit #197050, #201483healAttemptsRemainingcounter (3 attempts, reset on every successful IDB operation)dbp, retryexecuteTransactiononce (forces a freshindexedDB.open())dbp(capture reference, only clear if unchanged)dbpon rejection ingetDB()andverifyStoreExists(fixes pre-existing bug where a cached rejected promise caused infinite failures)Logger.logAlert/logInfoat every step: error detection, heal attempt, successful recovery, budget exhaustion, and non-recoverable errorsWhat it does NOT do (by design per #90636):
deleteDatabase()— proven to also fail when LevelDB files are corruptMemoryOnlyProviderdegradation — cache already absorbs all writes during the sessionstorage/index.tsorOnyxUtils.ts— those are separate issues (#90632, #90633)Linked E/App PR
Expensify/App#91085
Related Issues
Expensify/App#90636
Expensify/App#87862
Expensify/App#87864
Automated Tests
19 tests in
tests/unit/storage/providers/createStoreTest.ts:InvalidStateError retry (5): retry + succeed, retry + propagate, non-InvalidState DOMException skipped, non-DOMException skipped, data integrity after retry
Diagnostic logging (1): logInfo with all diagnostic fields on retry
Onclose/onversionchange handlers (4): log + recover for each
Backing store healing (5): mid-session heal, init-time heal, budget exhaustion, budget reset, error classification (wrong message / QuotaExceeded bypass)
Connection lost healing (4): heal + reopen, budget exhaustion, "connection is closing" variant, shared budget with backing store
All tests pass.
Manual Tests
Each test injects a monkey-patch via the browser console to simulate the error class, then triggers an Onyx write (e.g. navigate to a different chat). No code changes needed.
Test 1 — Backing store error (Chromium path):
[Onyx] IDB heal: backing store error detected — dropping cached connection and reopening (2 attempts left)[Onyx] IDB heal: successfully recovered after backing store errorTest 2 — Connection lost (Safari path):
[Onyx] IDB heal: connection lost error detected — dropping cached connection and reopening (2 attempts left)[Onyx] IDB heal: successfully recovered after connection lost errorTest 3 — Budget exhaustion (permanent failure, then gives up):
[Onyx] IDB heal: backing store error detected — dropping cached connection and reopening (2 attempts left)(heal retry also fails — error propagates)[Onyx] IDB heal: backing store error detected — dropping cached connection and reopening (1 attempts left)(heal retry also fails)[Onyx] IDB heal: backing store error detected — dropping cached connection and reopening (0 attempts left)(heal retry also fails)[Onyx] IDB heal: backing store error — heal budget exhausted, giving up(no heal attempted)IDBDatabase.prototype.transaction = _origTx;or refresh the pageAuthor Checklist
### Related Issuessection above### Linked E/App PRsection above, and verified this change against it (E/App CI passed and manual testing completed)TestssectiontoggleReportand notonIconClick)myBool && <MyComponent />.STYLE.md) were followedAvatar, I verified the components usingAvatarare working as expected)/** comment above it */thisproperly so there are no scoping issues (i.e. foronClick={this.submit}the methodthis.submitshould be bound tothisin the constructor)thisare necessary to be bound (i.e. avoidthis.submit = this.submit.bind(this);ifthis.submitis never passed to a component event handler likeonClick)Avataris modified, I verified thatAvataris working as expected in all cases)mainbranch was merged into this PR after a review, I tested again and verified the outcome was still expected according to theTeststeps.Screenshots/Videos
Android: Native
N/A — IDB is web-only, no native changes.
Android: mWeb Chrome
iOS: Native
N/A — IDB is web-only, no native changes.
iOS: mWeb Safari
MacOS: Chrome / Safari
Healed (simulated error):
healed.mov
Killed connection (simulated error):
killed.mp4
Exhausted (simulated error):
exhausted.mov
Healed (simulated on Safari):
safari_error.mp4
Killed connection (simulated on Safari):
safari_killed.mp4
Exhausted (simulated on Safari):
safari_exhausted.mp4
Healed offline (simulated on Chrome):
offline.mp4