arena ToT: einsum sub-World fence guard + (outer-Contraction, inner-Hadamard) view-cell case#550
Merged
evaleev merged 2 commits intoMay 21, 2026
Conversation
74c87c5 to
7a2d341
Compare
2 tasks
7a2d341 to
23505b3
Compare
23505b3 to
31f9447
Compare
3 tasks
Add an inline RAII guard `FenceSubWorldsOnExit` to the generalized- contraction path of einsum, declared right after the `worlds` vector so it destructs *before* `worlds` (LIFO) and *after* AB/C. On normal exit this is a final harmless drain; on exception unwind it drains any `lazy_sync_children` tasks that ~DistArray scheduled via lazy_deleter on sub-World taskqs before those sub-Worlds are torn down. Without this, those tasks survive into the global ThreadPool past ~World, then trip ~WorldObject's `World::exists(&world)` assertion when an enclosing scope's fence runs them, masking the real exception with a cryptic abort. One fence per sub-World suffices because lazy_deleter now bypasses lazy_sync when invoked from `do_cleanup` (gated by `world.gop.is_in_do_cleanup()`): the deferred-cleanup path performs direct deletes rather than scheduling cross-rank tasks. The remaining tasks this fence has to drain come only from non-deferred ~DistArray calls (e.g. AB during exception unwind), and all participating ranks of a sub-World reach this RAII guard in lockstep so their lazy_sync handshakes match up.
Add the (outer Contraction, inner Hadamard) case to init_inner_tile_op's view-cell branch. Mirrors the owning-tile path in init_inner_tile_op_owning_: arena_plan_ uses the `left_range` plan to shape each result cell from a non-empty left inner cell, and the per-cell op accumulates `r += l * rr` -- or `r += (l * rr) * factor_` when scaled -- via fused_hadamard_inplace into the pre-shaped view cell. No value-returning per-cell op is needed, so this works for view cells (e.g. ArenaTensor); non-identity inner result permutation is rejected (the owning fallback that materializes a permuted return cell cannot run for views). Previously this case threw "nested non-contraction product on view inner tiles is not yet supported", aborting expressions such as `C(i_3,i_4;a<...>) = A(i_3;a<...>) * B(i_4;a<...>)` over ArenaTensor inner cells -- the typical sub-product inside einsum's generalized contraction loop for ToTxToT with Hadamard outer-Hadamard inner shapes.
31f9447 to
31800a9
Compare
Base automatically changed from
evaleev/feature/lazy-deleter-skip-sync-in-do-cleanup
to
master
May 21, 2026 04:16
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stacked on top of #551 (lazy_deleter fast path).
einsum: add a single-fence RAIIFenceSubWorldsOnExitguard so any tasks scheduled by non-deferred~DistArraycalls (e.g. AB during exception unwind) are drained before sub-Worlds are torn down.cont_engine: add the(outer Contraction, inner Hadamard)view-cell case forArenaTensor-backed ToT inner tiles, via the existing arena fast path.Important
The PR base is set to
evaleev/feature/lazy-deleter-skip-sync-in-do-cleanup(#551). Once #551 merges, this PR will auto-retargetmaster.Why
Triaged from an abort while running a CSV-CCk traced expression over
ArenaTensorToT operands in MPQC. Cascade:C(i_3,i_4;a<...>) = A(i_3;a<...>) * B(i_4;a<...>)-- the typical sub-product insideeinsum's generalized-contraction loop after ToT operands are reduced per Hadamard tile -- hit the view-cell branch ofinit_inner_tile_opwith(outer Contraction, inner Hadamard). That combo had no handler and threw\"nested non-contraction product on view inner tiles is not yet supported\".einsum's per-hiteration. The temporaryworldsvector destructed during stack unwind, tearing down sub-Worlds whilelazy_sync_childrentasks scheduled by~DistArray'slazy_deleterwere still in the global ThreadPool's queue.~WorldObjectthen assertedWorld::exists(&world)and aborted, masking the real TA exception.#551 eliminates the deferred-cleanup half of (2) at the source (no
lazy_synctask ever scheduled in that path). This PR adds the missing view-cell case so (1) does not throw in the first place, and keeps a thin RAII guard so the non-deferred path during exception unwind also leaves the sub-Worlds clean.What
einsumRAII guardFenceSubWorldsOnExitdeclared right after theworldsvector. LIFO destruction means AB/C destruct first (releasing pimpls; the non-deferred path goes throughlazy_syncand enqueues tasks on sub-Worlds), then the guard runs (drains those tasks via a single fence per sub-World), thenworldsdestructs (empty taskqs). A single fence per sub-World is sufficient because #551 makes the deferred-cleanup~DistArraypath skiplazy_sync(no post-do_cleanuptask to drain); only tasks from non-deferred destructors remain, and a single drain suffices.All participating ranks of a sub-World reach this RAII guard at the same point in lockstep at function exit, so their
lazy_synchandshakes match up.cont_engineview-cell caseinit_inner_tile_opnow handles(outer Contraction, inner Hadamard)for view inner cells. Mirrorsinit_inner_tile_op_owning_:arena_plan_withArenaInnerShapeKind::left_rangeand a per-cellmake_fused_hadamard_lambda/make_fused_hadamard_scaled_lambdaop that accumulatesr += l * rr(optionally scaled) into pre-shaped view cells. Non-identity inner result permutation is rejected explicitly (the owning fallback that materializes a permuted return cell cannot run for views).Test plan
~WorldObjectduring stack unwind; now runs end-to-end with the real TA expression evaluating cleanly, exit 0.