Skip to content

Fix const char* return marshaling under --noCustomStringMarshal (heap corruption, regresses #34)#356

Open
kastwey wants to merge 2 commits into
Ruslan-B:mainfrom
kastwey:fix/const-char-ptr-return-marshal
Open

Fix const char* return marshaling under --noCustomStringMarshal (heap corruption, regresses #34)#356
kastwey wants to merge 2 commits into
Ruslan-B:mainfrom
kastwey:fix/const-char-ptr-return-marshal

Conversation

@kastwey

@kastwey kastwey commented Jun 24, 2026

Copy link
Copy Markdown

Fixes #355.

Summary

With --noCustomStringMarshal, the generator emits [return: MarshalAs(UnmanagedType.LPUTF8Str)] on const char*-returning functions. On a return value LPUTF8Str makes the CLR call CoTaskMemFree on the pointer after copying the string — but FFmpeg returns pointers to static/borrowed memory the caller must not free (av_version_info, avcodec_get_name, av_get_pix_fmt_name, avcodec_configuration, AVClass names, …). The runtime frees memory it doesn't own → native heap corruption, surfacing later as a non-deterministic AccessViolationException. This regresses the fix from #34, but only under the flag.

The flag is correct for parameters (the CLR owns the buffer it allocated, so freeing is right). It must simply never govern return marshaling.

Answers to the questions on the issue

Are you a user? Yes. We run production systems that regenerate FFmpeg.AutoGen bindings with --noCustomStringMarshal and ship them in a service doing heavy encoding/decoding around the clock. We hit this crash in the field; reverting the ~35 affected return attributes back to ConstCharPtrMarshaler (native binaries unchanged) eliminated it.

What is the driver? A real production bug, not a style preference. const char* returns under --noCustomStringMarshal cause latent native heap corruption that manifests as intermittent AccessViolationException under load — extremely hard to diagnose because the crash lands far from the offending free. The fix restores the exact behavior that #34 already established for the default path.

How much impact on the codebase? Minimal and isolated:

  • Generator: FunctionProcessor.GetReturnType — the const char* case now always uses ReturnMarshalAsConstCharPtr instead of branching on NoCustomStringMarshal; the now-unused ReturnMarshalAsLPUTF8Str constant is removed. No change to the default (non-flag) output — that path already produced ConstCharPtrMarshaler, so existing users see byte-identical generation. Only the --noCustomStringMarshal return attributes change.
  • ConstCharPtrMarshaler.MarshalNativeToManaged: decode as UTF-8 (PtrToStringUTF8) instead of ANSI (PtrToStringAnsi), with a manual UTF-8 fallback for netstandard2.0 where PtrToStringUTF8 is unavailable. For ASCII (the common case for these APIs) the result is byte-identical; for non-ASCII it fixes mojibake and makes return decoding consistent with the parameter path, which already uses UTF-8 under the flag. Cleanup stays a no-op, so the borrowed pointer is never freed.
  • New FFmpeg.AutoGen.Abstractions.Test MSTest project (net9.0;net48) covering both compile-time branches: null, empty, ASCII, multi-byte UTF-8 round-trips, and a check that the marshaler never frees the borrowed memory.

No public API changes, no behavioral change to parameter marshaling, no change to the default generated output.

Changes

  1. (required, fixes the crash) Stop emitting LPUTF8Str on const char* returns; always use ConstCharPtrMarshaler.
  2. (recommended) Decode const char* returns as UTF-8 instead of ANSI, with a netstandard2.0 fallback.
  3. (tests) Add unit tests for ConstCharPtrMarshaler covering both target frameworks.

Testing

New MSTest project passes on net9.0 (netstandard2.1 path → Marshal.PtrToStringUTF8) and net48 (netstandard2.0 path → manual UTF-8 decode).

kastwey added 2 commits June 24, 2026 10:23
GetReturnType emitted [return: MarshalAs(UnmanagedType.LPUTF8Str)] for
const char* returns when --noCustomStringMarshal was set. On a return
value LPUTF8Str makes the CLR call CoTaskMemFree on the returned pointer
after copying the string. FFmpeg returns pointers to static/borrowed
memory (av_version_info, avcodec_get_name, av_get_pix_fmt_name, AVClass
names, etc.), so freeing it corrupts the native heap and causes
AccessViolationException. This regressed the fix from issue Ruslan-B#34.

const char* returns now always use ConstCharPtrMarshaler regardless of
the flag; --noCustomStringMarshal only affects parameters, where the CLR
owns the marshalled buffer. The now-unused ReturnMarshalAsLPUTF8Str
constant is removed.

Also decode const char* returns as UTF-8 (PtrToStringUTF8) instead of the
system ANSI code page (PtrToStringAnsi), with a manual UTF-8 fallback for
netstandard2.0 where PtrToStringUTF8 is unavailable.

Fixes Ruslan-B#355
…2.1 paths

New multi-target (net9.0;net48) MSTest project exercises both compile-time branches of ConstCharPtrMarshaler.MarshalNativeToManaged: net9.0 resolves the netstandard2.1 build (Marshal.PtrToStringUTF8) and net48 resolves the netstandard2.0 build (manual UTF-8 decode). Covers null, empty, ASCII, multi-byte UTF-8 round-trips, and verifies the marshaler never frees the borrowed native memory.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

noCustomStringMarshal emits LPUTF8Str on const char* return values → native heap corruption (regresses #34)

1 participant