Skip to content

Fix AIO callback from_api completion lifetime#13151

Open
bneradt wants to merge 1 commit intoapache:masterfrom
bneradt:fix-aio-callback-lifetime-apache
Open

Fix AIO callback from_api completion lifetime#13151
bneradt wants to merge 1 commit intoapache:masterfrom
bneradt:fix-aio-callback-lifetime-apache

Conversation

@bneradt
Copy link
Copy Markdown
Contributor

@bneradt bneradt commented May 9, 2026

The AIOCallback io_complete handler used a member variable, from_api, to
determine whether to delete itself. The problem is that in the situation
where in the course of processing the function this was already
deleted, the use of the from_api variable was, by definition, a use
after free. If built under ASan, this resulted in a use-after-free
assertion.

Concretely, we were seeing this in docs during cache stripe
initialization after an unclean shutdown. If the on-disk cache
directory was dirty, startup recovery scanned the data area, cleared
directory entries for the uncertain range, and wrote the repaired
directory back out. The temporary AIO callbacks for that recovery wrote
live in StripeInitInfo. When the recovery write completion was
delivered, StripeSM::handle_recover_write_dir() deleted StripeInitInfo,
which destroyed the AIOCallback object whose AIOCallback::io_complete()
frame was still returning. At that point, the use of from_api was a use
after free.

This snapshots the API-owned callback flag before dispatching the
completion and uses that local value for the post-callback cleanup.
This also adds a focused regression test for completion handlers that
release the callback owner before AIOCallback::io_complete() returns.

Introduced in #13027

@bneradt bneradt added this to the 11.0.0 milestone May 9, 2026
@bneradt bneradt self-assigned this May 9, 2026
@bneradt bneradt force-pushed the fix-aio-callback-lifetime-apache branch 2 times, most recently from 71db5be to 4142430 Compare May 9, 2026 18:02
The AIOCallback io_complete handler used a member variable, from_api, to
determine whether to delete itself. The problem is that in the situation
where in the course of processing the function `this` was already
deleted, the use of the from_api variable was, by definition, a use
after free. If built under ASan, this resulted in a use-after-free
assertion.

Concretely, we were seeing this in docs during cache stripe
initialization after an unclean shutdown.  If the on-disk cache
directory was dirty, startup recovery scanned the data area, cleared
directory entries for the uncertain range, and wrote the repaired
directory back out.  The temporary AIO callbacks for that recovery wrote
live in StripeInitInfo.  When the recovery write completion was
delivered, StripeSM::handle_recover_write_dir() deleted StripeInitInfo,
which destroyed the AIOCallback object whose AIOCallback::io_complete()
frame was still returning. At that point, the use of from_api was a use
after free.

This snapshots the API-owned callback flag before dispatching the
completion and uses that local value for the post-callback cleanup.
This also adds a focused regression test for completion handlers that
release the callback owner before AIOCallback::io_complete() returns.

Introduced in apache#13027
@bneradt bneradt force-pushed the fix-aio-callback-lifetime-apache branch from 4142430 to e29f654 Compare May 9, 2026 18:05
@bneradt bneradt changed the title Fix AIO callback completion lifetime Fix AIO callback from_api completion lifetime May 9, 2026
@bneradt bneradt requested review from bryancall and Copilot May 9, 2026 18:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a use-after-free in AIOCallback::io_complete() when the completion handler deletes the callback owner (and thus destroys the AIOCallback) before io_complete() finishes returning. This is in the iocore AIO subsystem and impacts both internal cache recovery codepaths and API-originated AIO completions.

Changes:

  • Snapshot from_api into a local before dispatching the completion event, avoiding post-callback member access after potential self-destruction.
  • Add a focused Catch2 unit test that deletes the completion owner inside the completion handler.
  • Wire the new unit test into the src/iocore/aio CMake test targets.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
src/iocore/aio/AIO.cc Avoids UAF by copying from_api prior to invoking the completion handler.
include/iocore/aio/AIO.h Adds a clarifying comment describing from_api ownership semantics.
src/iocore/aio/CMakeLists.txt Adds test_AIOCallback unit test target and registers it with Catch2 test runner.
src/iocore/aio/unit_tests/test_AIOCallback.cc New regression test for owner deletion during io_complete() completion dispatch.

@zwoop zwoop self-requested a review May 9, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants