Fix Windows 0xC0000409 crash from CoUninitialize in a thread-local destructor#13207
Fix Windows 0xC0000409 crash from CoUninitialize in a thread-local destructor#13207acarl005 wants to merge 1 commit into
Conversation
…o CoUninitialize no longer runs in a thread-local destructor, fixing the Windows 0xC0000409 abort on background-executor threads
|
I'm starting a first review of this pull request. You can view the conversation on Warp. I completed the review and no human review was requested for this pull request. Comment Powered by Oz |
There was a problem hiding this comment.
Overview
This PR replaces the Windows antivirus COM thread-local guard with a scoped ComGuard so CoUninitialize() runs before the worker thread enters TLS teardown. The diff preserves the intended same-thread COM init/uninit pairing and I did not find correctness or security issues in the changed code.
Concerns
⚠️ [IMPORTANT] This change affects user-visible Windows crash behavior, but the PR description does not include a screenshot or screen recording demonstrating a Windows launch / Agent Mode smoke test end to end, and the manual test checkbox is still unchecked. Please attach visual evidence or a short recording showing the crash no longer reproduces before merging.
Verdict
Found: 0 critical, 1 important, 0 suggestions
Request changes
Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).
Powered by Oz
Description
On Windows, Warp aborts with
0xC0000409(FAST_FAIL_FATAL_APP_EXIT, subcode0x7— i.e. a Rust abort) shortly after launch / entering Agent Mode. Analysis of the four user-provided minidumps (#13135) shows an identical signature in every crash:background-executor-Ntokio worker.ntdll!LdrpCallTlsInitializers— the Windows loader's TLS callback that runsthread_local!destructors at thread teardown.warp+0x22718a2) across all dumps and both reported builds.Root cause
app/src/antivirus/windows.rsinitialized COM in athread_local!(COM_INITIALIZED) whoseDropcallsCoUninitialize(). The antivirusscan()runs on abackground-executor(tokio) worker viactx.spawn, so COM is initialized on that worker and the guard is stored in the worker's TLS. When that worker / runtime is later torn down, the destructor runsCoUninitialize()from inside the loader's TLS callback — i.e. under the loader lock, in the same context asDllMain.That context is what makes it unsound.
CoUninitialize()does far more than free a handle: it releases interface proxies and unloads the in-process COM servers thatCoCreateInstance(WSCProductList, …)pulled in — i.e. it re-enters the loader (FreeLibrary/LdrUnloadDll) while the TLS callback is already holding the loader lock, which deadlocks or runs another module's detach against partially torn-down state. Microsoft documents these hazards forDllMain, which runs under the loader lock exactly like a Windows TLS callback (and therefore a Rustthread_local!destructor): Dynamic-Link Library Best Practices states you "cannot call any function inDllMainthat directly or indirectly tries to acquire the loader lock," and lists both callingLoadLibrary/LoadLibraryEx"either directly or indirectly" and "Initialize COM threads by usingCoInitializeEx" among the things you should "never" do; theDllMainreference adds that "calling … COM functions can cause access violation errors, because some functions load other system components."CoUninitialize()hits both: it's a COM call, and it unloads in-process servers viaFreeLibrary.Warp builds with the default
panic = "unwind", so a panic on that teardown path unwinds into theextern "system"TLS callback (a non-unwindable boundary) and Rust converts it to__fastfail→0xC0000409. Background runtimes are created and dropped repeatedly during a session, so the teardown path runs repeatedly, matching the "crashes every ~30s / N times" reports.Fix
Replace the
thread_local!with a scoped RAIIComGuardcreated at the top ofscan()and dropped at the end of that synchronous scope.CoUninitialize()now runs on the live worker thread, immediately after the COM objects are released — never in a TLS destructor at thread teardown, never under the loader lock. The guard is declared before the COM interface pointers so it drops last, andscan()has no.awaitbetween init and drop, so init/uninit stay on one thread.To keep that invariant enforced,
ComGuardis!Send(via aPhantomData<Rc<()>>marker). If anyone later holds it across an.awaiton the multi-threaded background runtime, the spawned future becomes!Sendand fails to compile at thebackground_executor().spawn_boxedsite — turning the "same-threadCoUninitialize" rule into a compile-time guarantee.Linked Issue
Fixes #11952
Fixes #13135
Testing
Root cause was established by analyzing the four minidumps from #13135: all four show
0x7 FAST_FAIL_FATAL_APP_EXIT, a faultingbackground-executor-*thread, the samewarp+0x22718a2fault offset, andntdll!LdrpCallTlsInitializersas the caller.cargo fmt -p warp -- --checkpasses.scan()'s future staysSend(the!Sendguard is never held across an.await), so the existing spawn site still compiles../script/runAgent Mode
CHANGELOG-BUG-FIX: Fixed a Windows crash (0xC0000409) that could occur shortly after launch, caused by COM being uninitialized from a thread-local destructor during background-thread teardown.
Conversation: https://staging.warp.dev/conversation/f0b8410d-8255-4220-af43-e3b36b185c34
Co-Authored-By: Oz oz-agent@warp.dev