[fix](be) Avoid finalized pipeline task submit crash#64953
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
I reviewed the PR changes across the pipeline task lifecycle, hybrid scheduler submit path, close/finalize ordering, wake-up/requeue flow, revokable task path, task queue behavior, and the added BE unit coverage.
No inline issues found. The new submit gate is checked before HybridTaskScheduler::submit() calls is_blockable(), while close(close_sink=true) and finalize() close the gate before terminal cleanup can clear operator/sink state. The normal scheduler completion path still performs close() and finalize() from _do_work(), and duplicate/stale queued tasks either wait on set_running() or observe finalization before executing. The added tests cover both close-before-finalize and finalized submit attempts through the real HybridTaskScheduler path.
Critical checkpoints:
- Required AGENTS: none listed for this PR.
- Repository code-review skill: no repo-local
SKILL.mdwas present in this checkout, so I followed the review prompt and shared-ledger rules directly. - Existing inline review context: empty; no duplicate threads to avoid.
- User focus: no additional user-provided focus.
- Changed-file sweep: all four changed files reviewed; no unresolved suspicious point remains.
Subagent conclusions:
optimizer-rewrite:OPT-NONE; no optimizer/rewrite, semantic-equivalence, or parallel join/aggregate finding.tests-session-config:TSC-NONE; no regression-test, session/config, compatibility, or style finding.- Convergence round 1 ended with both live subagents replying
NO_NEW_VALUABLE_FINDINGSfor the same ledger/comment set with no proposed inline comments.
Validation:
git diff --checkpassed for the exact PR base/head range and changed files.clang-format --dry-run --Werrorpassed on the four changed files with the available clang-format 18.1.3.- BE unit/build validation was not run because this checkout does not include
thirdparty/installed.
### What problem does this PR solve? Issue Number: None Related PR: apache#64899 Problem Summary: A runtime filter dependency can wake a pipeline task while another thread is closing or finalizing the same task. HybridTaskScheduler::submit() synchronously calls PipelineTask::is_blockable() before enqueueing the task, and is_blockable() reads _sink and _operators. After close/finalize starts releasing terminal task resources, a late submit can therefore dereference cleared task resources and crash. This change adds a small submit gate on PipelineTask. Terminal close/finalize closes that gate under a dedicated lock, and HybridTaskScheduler::submit() checks the gate under the same lock before calling is_blockable(). The lock only covers the blockable check and does not extend to the actual queue submit. (cherry picked from commit 1dd3207) ### Release note None ### Check List (For Author) - Test: Unit Test / Static check - build-support/clang-format.sh - build-support/check-format.sh - git diff --check - Reused the compile command from be/ut_build_ASAN/compile_commands.json to run -fsyntax-only for be/test/exec/pipeline/pipeline_task_test.cpp. - Attempted: ./run-be-ut.sh --run --filter=PipelineTaskTest.TEST_CLOSED_TASK_REJECTS_HYBRID_SUBMIT_BEFORE_FINALIZE:PipelineTaskTest.TEST_FINALIZED_TASK_REJECTS_HYBRID_SUBMIT. It was interrupted after confirming the changed pipeline_task.cpp and task_scheduler.cpp objects compiled, because branch-4.1 BE UT uses one large doris_be_test binary and the filtered run expanded into a near full UT build. - Attempted: ninja -C be/ut_build_ASAN -j1 test/CMakeFiles/doris_be_test.dir/exec/pipeline/pipeline_task_test.cpp.o. It was stopped after it began rebuilding a large OpenBLAS dependency chain; the test-file compile error was verified instead with the compile_commands.json -fsyntax-only command above. - Behavior changed: No - Does this need documentation: No
c3fd220 to
d8b8f35
Compare
|
run buildall |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
skip buildall |
What problem does this PR solve?
Issue Number: None
Related PR: #64899
Problem Summary: A runtime filter dependency can wake a pipeline task while another thread is closing or finalizing the same task. HybridTaskScheduler::submit() synchronously calls PipelineTask::is_blockable() before enqueueing the task, and is_blockable() reads _sink and _operators. After close/finalize starts releasing terminal task resources, a late submit can therefore dereference cleared task resources and crash.
This change adds a small submit gate on PipelineTask. Terminal close/finalize closes that gate under a dedicated lock, and HybridTaskScheduler::submit() checks the gate under the same lock before calling is_blockable(). The lock only covers the blockable check and does not extend to the actual queue submit.
(cherry picked from commit 1dd3207)
Release note
None
Check List (For Author)