Skip to content

fix(video): check encoder started up before trying to flush frames on destruction#5257

Open
Kishi85 wants to merge 1 commit into
LizardByte:masterfrom
Kishi85:remove-video-frame-flushing-on-destruction
Open

fix(video): check encoder started up before trying to flush frames on destruction#5257
Kishi85 wants to merge 1 commit into
LizardByte:masterfrom
Kishi85:remove-video-frame-flushing-on-destruction

Conversation

@Kishi85

@Kishi85 Kishi85 commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

Description

This PR aims to fix #4943 by removing the frame flushing on ~avcodec_encode_session_t() for cases where the encoder has not stared up yet. Checking this is done by checking if avcodec_ctx->frame_num > avcodec_ctx->delay. This check was specifically chosen because of the following comment in the problematic section that was segfaulting before:
https://github.com/FFmpeg/FFmpeg/blob/15504610b0dc12c56e5e9f94ff06c873382368f5/libavcodec/hw_base_encode.c#L508:

        // Fix timestamps if we hit end-of-stream before the initial decode
        // delay has elapsed.
        if (ctx->input_order <= ctx->decode_delay)
            ctx->dts_pts_diff = ctx->pic_end->pts - ctx->first_pts;

This code sequence is expecting ctx->pic_end != NULL but if the encoder has not started up and seen enough frames (delay is usually 0 from my tests so far, so we're checking for at least one frame) then this expectation will not be fulfilled causing a segfault.

Screenshot

Issues Fixed or Closed

Roadmap Issues

Type of Change

  • feat: New feature (non-breaking change which adds functionality)
  • fix: Bug fix (non-breaking change which fixes an issue)
  • docs: Documentation only changes
  • style: Changes that do not affect the meaning of the code (white-space, formatting, missing semicolons, etc.)
  • refactor: Code change that neither fixes a bug nor adds a feature
  • perf: Code change that improves performance
  • test: Adding missing tests or correcting existing tests
  • build: Changes that affect the build system or external dependencies
  • ci: Changes to CI configuration files and scripts
  • chore: Other changes that don't modify src or test files
  • revert: Reverts a previous commit
  • BREAKING CHANGE: Introduces a breaking change (can be combined with any type above)

Checklist

  • Code follows the style guidelines of this project
  • Code has been self-reviewed
  • Code has been commented, particularly in hard-to-understand areas
  • Code docstring/documentation-blocks for new or existing methods/components have been added or updated
  • Unit tests have been added or updated for any new or modified functionality

AI Usage

  • None: No AI tools were used in creating this PR
  • Light: AI provided minor assistance (formatting, simple suggestions)
  • Moderate: AI helped with code generation or debugging specific parts
  • Heavy: AI generated most or all of the code changes

@Kishi85 Kishi85 changed the title fix(video): remove frame flushing on destruction to avoid segfault due to racecondition fix(video): remove frame flushing to avoid segfault due to racecondition Jun 6, 2026
@Kishi85 Kishi85 force-pushed the remove-video-frame-flushing-on-destruction branch from ec0fcc7 to 4ccc01b Compare June 6, 2026 08:14
@Kishi85 Kishi85 changed the title fix(video): remove frame flushing to avoid segfault due to racecondition fix(video): remove explicit frame flushing to avoid segfault due to racecondition Jun 6, 2026
@ReenigneArcher ReenigneArcher requested a review from cgutman June 6, 2026 23:04
@codecov

codecov Bot commented Jun 6, 2026

Copy link
Copy Markdown

Bundle Report

Bundle size has no change ✅

@cgutman

cgutman commented Jun 21, 2026

Copy link
Copy Markdown
Collaborator

This flushing was a workaround to address a memory leak each time streaming is started and stopped on AMD AMF encoders, so please test that to make sure you aren't reintroducing that.

I agree that it shouldn't be necessary, but it also shouldn't be crashing either. Flushing is a totally valid thing to do here. The crash probably means there's a deeper use-after-free or race condition that we're just hiding by removing the flushing code.

@Kishi85

Kishi85 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

This flushing was a workaround to address a memory leak each time streaming is started and stopped on AMD AMF encoders, so please test that to make sure you aren't reintroducing that.

I agree that it shouldn't be necessary, but it also shouldn't be crashing either. Flushing is a totally valid thing to do here. The crash probably means there's a deeper use-after-free or race condition that we're just hiding by removing the flushing code.

I'm also thinking that this is some kind of race condition that occurs when the hw context is torn down (by closing the capture instance as the specific problem here is accessing ctx->pic_end which is NULL as noted in #4943).

What would be interesting is to know if we have a way to ensure that this context is still valid when the frame flushing is validated (as that is where it segfaults) or if the fix can be done solely in FFmpeg.

@Kishi85 Kishi85 force-pushed the remove-video-frame-flushing-on-destruction branch from 4ccc01b to 3731283 Compare June 22, 2026 18:27
@Kishi85 Kishi85 changed the title fix(video): remove explicit frame flushing to avoid segfault due to racecondition fix(video): check encoder started up before trying to flush frames on destruction Jun 22, 2026
@sonarqubecloud

Copy link
Copy Markdown

@Kishi85

Kishi85 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

I think I've found a better solution and also finally the root cause for this issue which is due to the encoder never having seen a single frame (avcodec_ctx->frame_num being 0) when trying to flush frames. This does also explain why ctx->pic_end is NULL when the segfault hits as there was never a frame that could have be compared to.

So my updated solution is to check if the encoder has started up before trying to flush frames and otherwise skip this step. Checking if the encoder has started up is done by checking if avcodec_ctx->frame_num > avcodec_ctx->delay. This check was specifically chosen because of the following comment in the problematic section that was segfaulting before:
https://github.com/FFmpeg/FFmpeg/blob/15504610b0dc12c56e5e9f94ff06c873382368f5/libavcodec/hw_base_encode.c#L508:

        // Fix timestamps if we hit end-of-stream before the initial decode
        // delay has elapsed.

This seems to fix the issue without having to remove the frame flushing fully leaving the workaround for AMD AMF intact, except for when it has not seen enough frames when the encoder is starting up but the delay seems to be 0 for the encoders I was able to test here (so it's just checking if the encoder has seen a frame at all in order to do the flushing).

@cgutman Does this look like a better solution to you?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Coredump with SIGSEGV when switching displays

2 participants