Skip to content

Fix Windows 0xC0000409 crash from CoUninitialize in a thread-local destructor#13207

Open
acarl005 wants to merge 1 commit into
masterfrom
fix/windows-antivirus-com-tls-destructor-crash
Open

Fix Windows 0xC0000409 crash from CoUninitialize in a thread-local destructor#13207
acarl005 wants to merge 1 commit into
masterfrom
fix/windows-antivirus-com-tls-destructor-crash

Conversation

@acarl005

@acarl005 acarl005 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Description

On Windows, Warp aborts with 0xC0000409 (FAST_FAIL_FATAL_APP_EXIT, subcode 0x7 — i.e. a Rust abort) shortly after launch / entering Agent Mode. Analysis of the four user-provided minidumps (#13135) shows an identical signature in every crash:

  • The faulting thread is always a background-executor-N tokio worker.
  • The aborting frame is called directly by ntdll!LdrpCallTlsInitializers — the Windows loader's TLS callback that runs thread_local! destructors at thread teardown.
  • Same fault offset (warp+0x22718a2) across all dumps and both reported builds.

Root cause

app/src/antivirus/windows.rs initialized COM in a thread_local! (COM_INITIALIZED) whose Drop calls CoUninitialize(). The antivirus scan() runs on a background-executor (tokio) worker via ctx.spawn, so COM is initialized on that worker and the guard is stored in the worker's TLS. When that worker / runtime is later torn down, the destructor runs CoUninitialize() from inside the loader's TLS callback — i.e. under the loader lock, in the same context as DllMain.

That context is what makes it unsound. CoUninitialize() does far more than free a handle: it releases interface proxies and unloads the in-process COM servers that CoCreateInstance(WSCProductList, …) pulled in — i.e. it re-enters the loader (FreeLibrary / LdrUnloadDll) while the TLS callback is already holding the loader lock, which deadlocks or runs another module's detach against partially torn-down state. Microsoft documents these hazards for DllMain, which runs under the loader lock exactly like a Windows TLS callback (and therefore a Rust thread_local! destructor): Dynamic-Link Library Best Practices states you "cannot call any function in DllMain that directly or indirectly tries to acquire the loader lock," and lists both calling LoadLibrary/LoadLibraryEx "either directly or indirectly" and "Initialize COM threads by using CoInitializeEx" among the things you should "never" do; the DllMain reference adds that "calling … COM functions can cause access violation errors, because some functions load other system components." CoUninitialize() hits both: it's a COM call, and it unloads in-process servers via FreeLibrary.

Warp builds with the default panic = "unwind", so a panic on that teardown path unwinds into the extern "system" TLS callback (a non-unwindable boundary) and Rust converts it to __fastfail0xC0000409. Background runtimes are created and dropped repeatedly during a session, so the teardown path runs repeatedly, matching the "crashes every ~30s / N times" reports.

Fix

Replace the thread_local! with a scoped RAII ComGuard created at the top of scan() and dropped at the end of that synchronous scope. CoUninitialize() now runs on the live worker thread, immediately after the COM objects are released — never in a TLS destructor at thread teardown, never under the loader lock. The guard is declared before the COM interface pointers so it drops last, and scan() has no .await between init and drop, so init/uninit stay on one thread.

To keep that invariant enforced, ComGuard is !Send (via a PhantomData<Rc<()>> marker). If anyone later holds it across an .await on the multi-threaded background runtime, the spawned future becomes !Send and fails to compile at the background_executor().spawn_boxed site — turning the "same-thread CoUninitialize" rule into a compile-time guarantee.

Linked Issue

Fixes #11952
Fixes #13135

Testing

Root cause was established by analyzing the four minidumps from #13135: all four show 0x7 FAST_FAIL_FATAL_APP_EXIT, a faulting background-executor-* thread, the same warp+0x22718a2 fault offset, and ntdll!LdrpCallTlsInitializers as the caller.

  • cargo fmt -p warp -- --check passes.
  • Change is COM-init scoping only; scan()'s future stays Send (the !Send guard is never held across an .await), so the existing spawn site still compiles.
  • I have manually tested my changes locally with ./script/run

Note: cargo check/cargo clippy and a manual Windows launch were not completed in this session — please run ./script/presubmit and smoke-test on Windows before merging.

Agent Mode

  • Warp Agent Mode - This PR was created via Warp's AI Agent Mode

CHANGELOG-BUG-FIX: Fixed a Windows crash (0xC0000409) that could occur shortly after launch, caused by COM being uninitialized from a thread-local destructor during background-thread teardown.


Conversation: https://staging.warp.dev/conversation/f0b8410d-8255-4220-af43-e3b36b185c34

Co-Authored-By: Oz oz-agent@warp.dev

…o CoUninitialize no longer runs in a thread-local destructor, fixing the Windows 0xC0000409 abort on background-executor threads
@cla-bot cla-bot Bot added the cla-signed label Jun 29, 2026
@oz-for-oss

oz-for-oss Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@acarl005

I'm starting a first review of this pull request.

You can view the conversation on Warp.

I completed the review and no human review was requested for this pull request.

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

@oz-for-oss oz-for-oss Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overview

This PR replaces the Windows antivirus COM thread-local guard with a scoped ComGuard so CoUninitialize() runs before the worker thread enters TLS teardown. The diff preserves the intended same-thread COM init/uninit pairing and I did not find correctness or security issues in the changed code.

Concerns

  • ⚠️ [IMPORTANT] This change affects user-visible Windows crash behavior, but the PR description does not include a screenshot or screen recording demonstrating a Windows launch / Agent Mode smoke test end to end, and the manual test checkbox is still unchecked. Please attach visual evidence or a short recording showing the crash no longer reproduces before merging.

Verdict

Found: 0 critical, 1 important, 0 suggestions

Request changes

Comment /oz-review on this pull request to retrigger a review (up to 3 times on the same pull request).

Powered by Oz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

1 participant