Fix AIO callback from_api completion lifetime#13151
Open
bneradt wants to merge 1 commit intoapache:masterfrom
Open
Fix AIO callback from_api completion lifetime#13151bneradt wants to merge 1 commit intoapache:masterfrom
bneradt wants to merge 1 commit intoapache:masterfrom
Conversation
71db5be to
4142430
Compare
The AIOCallback io_complete handler used a member variable, from_api, to determine whether to delete itself. The problem is that in the situation where in the course of processing the function `this` was already deleted, the use of the from_api variable was, by definition, a use after free. If built under ASan, this resulted in a use-after-free assertion. Concretely, we were seeing this in docs during cache stripe initialization after an unclean shutdown. If the on-disk cache directory was dirty, startup recovery scanned the data area, cleared directory entries for the uncertain range, and wrote the repaired directory back out. The temporary AIO callbacks for that recovery wrote live in StripeInitInfo. When the recovery write completion was delivered, StripeSM::handle_recover_write_dir() deleted StripeInitInfo, which destroyed the AIOCallback object whose AIOCallback::io_complete() frame was still returning. At that point, the use of from_api was a use after free. This snapshots the API-owned callback flag before dispatching the completion and uses that local value for the post-callback cleanup. This also adds a focused regression test for completion handlers that release the callback owner before AIOCallback::io_complete() returns. Introduced in apache#13027
4142430 to
e29f654
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
Fixes a use-after-free in AIOCallback::io_complete() when the completion handler deletes the callback owner (and thus destroys the AIOCallback) before io_complete() finishes returning. This is in the iocore AIO subsystem and impacts both internal cache recovery codepaths and API-originated AIO completions.
Changes:
- Snapshot
from_apiinto a local before dispatching the completion event, avoiding post-callback member access after potential self-destruction. - Add a focused Catch2 unit test that deletes the completion owner inside the completion handler.
- Wire the new unit test into the
src/iocore/aioCMake test targets.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/iocore/aio/AIO.cc | Avoids UAF by copying from_api prior to invoking the completion handler. |
| include/iocore/aio/AIO.h | Adds a clarifying comment describing from_api ownership semantics. |
| src/iocore/aio/CMakeLists.txt | Adds test_AIOCallback unit test target and registers it with Catch2 test runner. |
| src/iocore/aio/unit_tests/test_AIOCallback.cc | New regression test for owner deletion during io_complete() completion dispatch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The AIOCallback io_complete handler used a member variable, from_api, to
determine whether to delete itself. The problem is that in the situation
where in the course of processing the function
thiswas alreadydeleted, the use of the from_api variable was, by definition, a use
after free. If built under ASan, this resulted in a use-after-free
assertion.
Concretely, we were seeing this in docs during cache stripe
initialization after an unclean shutdown. If the on-disk cache
directory was dirty, startup recovery scanned the data area, cleared
directory entries for the uncertain range, and wrote the repaired
directory back out. The temporary AIO callbacks for that recovery wrote
live in StripeInitInfo. When the recovery write completion was
delivered, StripeSM::handle_recover_write_dir() deleted StripeInitInfo,
which destroyed the AIOCallback object whose AIOCallback::io_complete()
frame was still returning. At that point, the use of from_api was a use
after free.
This snapshots the API-owned callback flag before dispatching the
completion and uses that local value for the post-callback cleanup.
This also adds a focused regression test for completion handlers that
release the callback owner before AIOCallback::io_complete() returns.
Introduced in #13027