Track spill read-back memory in SMJ#22103
Open
SubhamSinghal wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Follow-up to #21962.
Rationale for this change
After #21962, the memory pool accurately tracks residual
join_arraysmemory that remains after aBufferedBatchisspilled to disk. However, when spilled batches are read back from disk during output materialization in
materialize_right_columns, the deserialized data temporarily exists in memory without any pool reservation.The pool thinks these batches cost 0 bytes during read-back. Under memory pressure (the reason they were spilled), other
operators see stale headroom and may over-allocate, risking OOM.
What changes are included in this PR?
Changed
materialize_right_columnsfrom&selfto&mut selfand addedgrow/shrinkat the exact points where spilled data is read from disk:Path A (single source spilled):
grow(size_estimation)immediately beforefetch_right_columns_by_idxsshrink(size_estimation)immediately afterPath B (multi-source interleave):
size_estimationfor all spilled sourcesgrow(total)beforesource_dataloadingshrink(total)after interleave completesUses unconditional
grow()because the data must be read to produce output — there is no fallback. Same rationale as#21962: if memory physically exists, the pool must reflect it.
Are these changes tested?
Yes — two new tests:
spill_read_back_memory_accounting: multiple buffered batches for same key (multi-source Path B) — verifiespeak_mem_used >= size_estimationandpool.reserved() == 0at endspill_read_back_single_source: distinct keys with one batch per group (single-source Path A) — same assertionsAre there any user-facing changes?
No.