Skip to content

fix(scheduler): Fix flaky TestQueueConcurrency deadlock#7496

Merged
yeya24 merged 1 commit intocortexproject:masterfrom
yeya24:fix/flaky-test-queue-concurrency
May 11, 2026
Merged

fix(scheduler): Fix flaky TestQueueConcurrency deadlock#7496
yeya24 merged 1 commit intocortexproject:masterfrom
yeya24:fix/flaky-test-queue-concurrency

Conversation

@yeya24
Copy link
Copy Markdown
Contributor

@yeya24 yeya24 commented May 9, 2026

What this PR does

Fix flaky TestQueueConcurrency that could deadlock due to goroutines blocking on dequeueRequest with an empty channel.

Root Cause

FIFORequestQueue.dequeueRequest performs a blocking channel receive (r := <-f.queue). In the test, goroutines with cnt=5,15,25 (odd multiples of 5) call dequeueRequest unconditionally. If they execute before any enqueue goroutines populate the channel, they block forever, and the WaitGroup.Wait() never completes — causing a 30-minute timeout.

Fix

Guard the dequeueRequest call with queue.length() > 0 to prevent blocking on an empty channel.

How it was tested

  • go test -run TestQueueConcurrency -count=10 — 10/10 pass
  • go test -race -run TestQueueConcurrency -count=10 — clean
  • Full package test suite passes

The test could deadlock because goroutines calling dequeueRequest on a
FIFORequestQueue would block indefinitely on an empty channel. This
happened when odd multiples of 5 (cnt=5,15,25) raced ahead of the
enqueue goroutines, causing the WaitGroup to never complete.

Fix by checking queue.length() > 0 before attempting to dequeue,
preventing the blocking channel receive on an empty queue.

Signed-off-by: Ben Ye <benye@amazon.com>
Copy link
Copy Markdown
Member

@SungJin1212 SungJin1212 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 10, 2026
@yeya24 yeya24 merged commit c40bb7c into cortexproject:master May 11, 2026
37 checks passed
@yeya24 yeya24 deleted the fix/flaky-test-queue-concurrency branch May 11, 2026 04:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm This PR has been approved by a maintainer size/XS type/flaky-test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants