Fix const char* return marshaling under --noCustomStringMarshal (heap corruption, regresses #34)#356
Open
kastwey wants to merge 2 commits into
Open
Conversation
GetReturnType emitted [return: MarshalAs(UnmanagedType.LPUTF8Str)] for const char* returns when --noCustomStringMarshal was set. On a return value LPUTF8Str makes the CLR call CoTaskMemFree on the returned pointer after copying the string. FFmpeg returns pointers to static/borrowed memory (av_version_info, avcodec_get_name, av_get_pix_fmt_name, AVClass names, etc.), so freeing it corrupts the native heap and causes AccessViolationException. This regressed the fix from issue Ruslan-B#34. const char* returns now always use ConstCharPtrMarshaler regardless of the flag; --noCustomStringMarshal only affects parameters, where the CLR owns the marshalled buffer. The now-unused ReturnMarshalAsLPUTF8Str constant is removed. Also decode const char* returns as UTF-8 (PtrToStringUTF8) instead of the system ANSI code page (PtrToStringAnsi), with a manual UTF-8 fallback for netstandard2.0 where PtrToStringUTF8 is unavailable. Fixes Ruslan-B#355
…2.1 paths New multi-target (net9.0;net48) MSTest project exercises both compile-time branches of ConstCharPtrMarshaler.MarshalNativeToManaged: net9.0 resolves the netstandard2.1 build (Marshal.PtrToStringUTF8) and net48 resolves the netstandard2.0 build (manual UTF-8 decode). Covers null, empty, ASCII, multi-byte UTF-8 round-trips, and verifies the marshaler never frees the borrowed native memory.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #355.
Summary
With
--noCustomStringMarshal, the generator emits[return: MarshalAs(UnmanagedType.LPUTF8Str)]onconst char*-returning functions. On a return valueLPUTF8Strmakes the CLR callCoTaskMemFreeon the pointer after copying the string — but FFmpeg returns pointers to static/borrowed memory the caller must not free (av_version_info,avcodec_get_name,av_get_pix_fmt_name,avcodec_configuration, AVClass names, …). The runtime frees memory it doesn't own → native heap corruption, surfacing later as a non-deterministicAccessViolationException. This regresses the fix from #34, but only under the flag.The flag is correct for parameters (the CLR owns the buffer it allocated, so freeing is right). It must simply never govern return marshaling.
Answers to the questions on the issue
Are you a user? Yes. We run production systems that regenerate FFmpeg.AutoGen bindings with
--noCustomStringMarshaland ship them in a service doing heavy encoding/decoding around the clock. We hit this crash in the field; reverting the ~35 affected return attributes back toConstCharPtrMarshaler(native binaries unchanged) eliminated it.What is the driver? A real production bug, not a style preference.
const char*returns under--noCustomStringMarshalcause latent native heap corruption that manifests as intermittentAccessViolationExceptionunder load — extremely hard to diagnose because the crash lands far from the offending free. The fix restores the exact behavior that #34 already established for the default path.How much impact on the codebase? Minimal and isolated:
FunctionProcessor.GetReturnType— theconst char*case now always usesReturnMarshalAsConstCharPtrinstead of branching onNoCustomStringMarshal; the now-unusedReturnMarshalAsLPUTF8Strconstant is removed. No change to the default (non-flag) output — that path already producedConstCharPtrMarshaler, so existing users see byte-identical generation. Only the--noCustomStringMarshalreturn attributes change.ConstCharPtrMarshaler.MarshalNativeToManaged: decode as UTF-8 (PtrToStringUTF8) instead of ANSI (PtrToStringAnsi), with a manual UTF-8 fallback fornetstandard2.0wherePtrToStringUTF8is unavailable. For ASCII (the common case for these APIs) the result is byte-identical; for non-ASCII it fixes mojibake and makes return decoding consistent with the parameter path, which already uses UTF-8 under the flag. Cleanup stays a no-op, so the borrowed pointer is never freed.FFmpeg.AutoGen.Abstractions.TestMSTest project (net9.0;net48) covering both compile-time branches: null, empty, ASCII, multi-byte UTF-8 round-trips, and a check that the marshaler never frees the borrowed memory.No public API changes, no behavioral change to parameter marshaling, no change to the default generated output.
Changes
LPUTF8Stronconst char*returns; always useConstCharPtrMarshaler.const char*returns as UTF-8 instead of ANSI, with anetstandard2.0fallback.ConstCharPtrMarshalercovering both target frameworks.Testing
New MSTest project passes on
net9.0(netstandard2.1 path →Marshal.PtrToStringUTF8) andnet48(netstandard2.0 path → manual UTF-8 decode).