Feat/gemma4 adapters#1385
Conversation
Adds a text-only adapter covering both Gemma4ForConditionalGeneration (E2B/E4B/31B/26B-A4B) and Gemma4UnifiedForConditionalGeneration (12B), addressing TransformerLensOrg#1297. Gemma 4 layers are heterogeneous: KV-shared layers drop k/v projections, K==V layers drop v_proj, and per-layer-embedding / MoE submodules appear only on some variants -- all mapped optional and delegated to HF. Unlike Gemma 1-3, Gemma4RMSNorm has no (1+weight) offset. Adds DelegatedAttentionBlockBridge (drops the split-QKV fork aliases, as MLABlockBridge does) so hook-alias resolution stays clean when attention is delegated wholesale to HF. google/gemma-4-E2B-it passes verification (P1 100%, P2 100%, P4 94.7%). - New adapter + four-place registration + gemma4/gemma4_unified model_type mappings - 10 checkpoints added to the model registry - Unit + integration tests (logit parity vs HF on all three structural variants)
jlarson4
left a comment
There was a problem hiding this comment.
Hey @huseyincavusbi glad to finally see this come through. I have a couple comments that exist below, take a look when you have a moment and let me know what you think.
Additionally, @punishell has recently opened #1377, which is a parallel implementation of Gemma4. I'd like to include bits of both your implementations where it makes sense & is relevant. They came up with a very straight forward solution for the KV-cache issue that might be of use to you, if you want to try rebasing your work onto theirs as an extension point. I am thinking there may be a way to use their DelegatedAttentionBlockBridge in combination with your work spent on adding support for Gemma4 to position_embeddings_atttention to provide even better overall support.
There are more moving parts here than anticipated, if you have questions please feel free to ask.
7eed605 to
a6bdd62
Compare
…Bridge expects positional 'vision_features' but Gemma4's Gemma4MultimodalEmbedder.forward() takes 'inputs_embeds' kwarg
…ma4's boi_token is a marker, image_token is the expandable placeholder
…elegation), P4=98.7% from real-model run
|
Awesome thank you @huseyincavusbi! I will review this today and let you know if I have any additional comments. |
|
@huseyincavusbi A couple things we probably want to look into before we merge this:
There are a couple other benchmark-related bugs that I will file separately after this PR is merged, but none that will be blocking. Thank you for providing the logs for your verification runs they were crucial in putting together this review. Being able to see the 31B model output was invaluable. EDIT: I should also clarify that I will resolve the merge conflict before I merge the PR once your final changes are in, don't worry about it for the time being. |
Description
This PR adds
TransformerBridgesupport for the Gemma 4 model family (E2B,E4B,26B-A4B, and31B) through a single unifiedGemma4ArchitectureAdapter.Key Implementation Details
gemma4.py): Dynamically handles all 4 variants by evaluating initialization configuration flags:enable_moe_block=True(specifically for the26Bvariant).num_kv_shared_layers > 0(forE2B/E4B).hidden_size_per_layer_input > 0.position_embeddings_attention.py: Applies V norm post-reshape (Gemma 4 is the first architecture featuring per-head value normalization). Handles KV-sharing delegation to Hugging Face's original attention implementation when K/V submodules are omitted. Caches computed KV states inshared_kv_statespost-RoPE for structural layer reuse.bridge.py: Introduces ause_native_generateopt-in flag. This bypasses a current Hugging Facetransformersdev-version issue where eager attention causes a KV-cache dimension mismatch during generation. Setting this flag (scoped strictly to this adapter) delegates processing to HF's nativegenerate()utilizing SDPA.main_benchmark.py: Fixes pad_token_id assignment when eos_token_id is a list (Gemma4 uses [1, 106]), taking the first element.Verification & Performance
All models have been validated.
Fixes #1297
Type of change
Please delete options that are not relevant.
Screenshots
Please attach before and after screenshots of the change if applicable.
Checklist: