Skip to content

[fix](be) Avoid finalized pipeline task submit crash#64953

Merged
yiguolei merged 1 commit into
apache:branch-4.1from
BiteTheDDDDt:codex/pick-64899-branch-4.1
Jun 30, 2026
Merged

[fix](be) Avoid finalized pipeline task submit crash#64953
yiguolei merged 1 commit into
apache:branch-4.1from
BiteTheDDDDt:codex/pick-64899-branch-4.1

Conversation

@BiteTheDDDDt

@BiteTheDDDDt BiteTheDDDDt commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: #64899

Problem Summary: A runtime filter dependency can wake a pipeline task while another thread is closing or finalizing the same task. HybridTaskScheduler::submit() synchronously calls PipelineTask::is_blockable() before enqueueing the task, and is_blockable() reads _sink and _operators. After close/finalize starts releasing terminal task resources, a late submit can therefore dereference cleared task resources and crash.

This change adds a small submit gate on PipelineTask. Terminal close/finalize closes that gate under a dedicated lock, and HybridTaskScheduler::submit() checks the gate under the same lock before calling is_blockable(). The lock only covers the blockable check and does not extend to the actual queue submit.

(cherry picked from commit 1dd3207)

Release note

None

Check List (For Author)

  • Test: Unit Test / Static check
    • build-support/clang-format.sh
    • build-support/check-format.sh
    • git diff --check
    • Reused the compile command from be/ut_build_ASAN/compile_commands.json to run -fsyntax-only for be/test/exec/pipeline/pipeline_task_test.cpp.
    • Attempted: ./run-be-ut.sh --run --filter=PipelineTaskTest.TEST_CLOSED_TASK_REJECTS_HYBRID_SUBMIT_BEFORE_FINALIZE:PipelineTaskTest.TEST_FINALIZED_TASK_REJECTS_HYBRID_SUBMIT. It was interrupted after confirming the changed pipeline_task.cpp and task_scheduler.cpp objects compiled, because branch-4.1 BE UT uses one large doris_be_test binary and the filtered run expanded into a near full UT build.
    • Attempted: ninja -C be/ut_build_ASAN -j1 test/CMakeFiles/doris_be_test.dir/exec/pipeline/pipeline_task_test.cpp.o. It was stopped after it began rebuilding a large OpenBLAS dependency chain; the test-file compile error was verified instead with the compile_commands.json -fsyntax-only command above.
  • Behavior changed: No
  • Does this need documentation: No

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt BiteTheDDDDt marked this pull request as ready for review June 29, 2026 09:51
@BiteTheDDDDt BiteTheDDDDt requested a review from yiguolei as a code owner June 29, 2026 09:51
@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

/review

@BiteTheDDDDt

Copy link
Copy Markdown
Contributor Author

run buildall

yiguolei
yiguolei previously approved these changes Jun 29, 2026
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 29, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the PR changes across the pipeline task lifecycle, hybrid scheduler submit path, close/finalize ordering, wake-up/requeue flow, revokable task path, task queue behavior, and the added BE unit coverage.

No inline issues found. The new submit gate is checked before HybridTaskScheduler::submit() calls is_blockable(), while close(close_sink=true) and finalize() close the gate before terminal cleanup can clear operator/sink state. The normal scheduler completion path still performs close() and finalize() from _do_work(), and duplicate/stale queued tasks either wait on set_running() or observe finalization before executing. The added tests cover both close-before-finalize and finalized submit attempts through the real HybridTaskScheduler path.

Critical checkpoints:

  • Required AGENTS: none listed for this PR.
  • Repository code-review skill: no repo-local SKILL.md was present in this checkout, so I followed the review prompt and shared-ledger rules directly.
  • Existing inline review context: empty; no duplicate threads to avoid.
  • User focus: no additional user-provided focus.
  • Changed-file sweep: all four changed files reviewed; no unresolved suspicious point remains.

Subagent conclusions:

  • optimizer-rewrite: OPT-NONE; no optimizer/rewrite, semantic-equivalence, or parallel join/aggregate finding.
  • tests-session-config: TSC-NONE; no regression-test, session/config, compatibility, or style finding.
  • Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same ledger/comment set with no proposed inline comments.

Validation:

  • git diff --check passed for the exact PR base/head range and changed files.
  • clang-format --dry-run --Werror passed on the four changed files with the available clang-format 18.1.3.
  • BE unit/build validation was not run because this checkout does not include thirdparty/installed.

### What problem does this PR solve?

Issue Number: None

Related PR: apache#64899

Problem Summary: A runtime filter dependency can wake a pipeline task while another thread is closing or finalizing the same task. HybridTaskScheduler::submit() synchronously calls PipelineTask::is_blockable() before enqueueing the task, and is_blockable() reads _sink and _operators. After close/finalize starts releasing terminal task resources, a late submit can therefore dereference cleared task resources and crash.

This change adds a small submit gate on PipelineTask. Terminal close/finalize closes that gate under a dedicated lock, and HybridTaskScheduler::submit() checks the gate under the same lock before calling is_blockable(). The lock only covers the blockable check and does not extend to the actual queue submit.

(cherry picked from commit 1dd3207)

### Release note

None

### Check List (For Author)

- Test: Unit Test / Static check
    - build-support/clang-format.sh
    - build-support/check-format.sh
    - git diff --check
    - Reused the compile command from be/ut_build_ASAN/compile_commands.json to run -fsyntax-only for be/test/exec/pipeline/pipeline_task_test.cpp.
    - Attempted: ./run-be-ut.sh --run --filter=PipelineTaskTest.TEST_CLOSED_TASK_REJECTS_HYBRID_SUBMIT_BEFORE_FINALIZE:PipelineTaskTest.TEST_FINALIZED_TASK_REJECTS_HYBRID_SUBMIT. It was interrupted after confirming the changed pipeline_task.cpp and task_scheduler.cpp objects compiled, because branch-4.1 BE UT uses one large doris_be_test binary and the filtered run expanded into a near full UT build.
    - Attempted: ninja -C be/ut_build_ASAN -j1 test/CMakeFiles/doris_be_test.dir/exec/pipeline/pipeline_task_test.cpp.o. It was stopped after it began rebuilding a large OpenBLAS dependency chain; the test-file compile error was verified instead with the compile_commands.json -fsyntax-only command above.
- Behavior changed: No
- Does this need documentation: No
@BiteTheDDDDt BiteTheDDDDt force-pushed the codex/pick-64899-branch-4.1 branch from c3fd220 to d8b8f35 Compare June 29, 2026 10:19
@github-actions github-actions Bot removed the approved Indicates a PR has been approved by one committer. label Jun 29, 2026
@yiguolei

Copy link
Copy Markdown
Contributor

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (17/17) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.72% (27466/37259)
Line Coverage 57.42% (297924/518882)
Region Coverage 55.10% (250268/454217)
Branch Coverage 56.48% (108252/191673)

1 similar comment
@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (17/17) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.72% (27466/37259)
Line Coverage 57.42% (297924/518882)
Region Coverage 55.10% (250268/454217)
Branch Coverage 56.48% (108252/191673)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (17/17) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.72% (27467/37259)
Line Coverage 57.42% (297923/518882)
Region Coverage 55.10% (250275/454217)
Branch Coverage 56.47% (108246/191673)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (17/17) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.72% (27467/37259)
Line Coverage 57.42% (297954/518882)
Region Coverage 55.09% (250229/454217)
Branch Coverage 56.48% (108265/191673)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (17/17) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.71% (27465/37259)
Line Coverage 57.41% (297906/518882)
Region Coverage 55.06% (250085/454217)
Branch Coverage 56.47% (108245/191673)

@yiguolei

Copy link
Copy Markdown
Contributor

skip buildall

@yiguolei yiguolei merged commit 0c2f579 into apache:branch-4.1 Jun 30, 2026
33 of 35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants