Feat/gemma4 adapters by huseyincavusbi · Pull Request #1385 · TransformerLensOrg/TransformerLens

huseyincavusbi · 2026-06-13T12:07:24Z

Description

This PR adds TransformerBridge support for the Gemma 4 model family (E2B, E4B, 26B-A4B, and 31B) through a single unified Gemma4ArchitectureAdapter.

Key Implementation Details

Unified Adapter (gemma4.py): Dynamically handles all 4 variants by evaluating initialization configuration flags:
- MoE Blocks: Submodules conditionally spin up only when enable_moe_block=True (specifically for the 26B variant).
- KV-Sharing: Dropped gracefully when num_kv_shared_layers > 0 (for E2B/E4B).
- PLE Embeddings: Surfaced dynamically when hidden_size_per_layer_input > 0.
- Weight Processing: Maps and converts Gemma 4's joint QKV layout, dual RoPE configurations, alternating sliding/full attention mechanisms, logit softcapping, and RMSNorm.
- Includes 45 dedicated unit tests verifying config attributes, MoE behavior, and weight conversions.
Shared-Library Updates (3 files, fully opt-in, zero regressions on existing adapter tests):
1. position_embeddings_attention.py: Applies V norm post-reshape (Gemma 4 is the first architecture featuring per-head value normalization). Handles KV-sharing delegation to Hugging Face's original attention implementation when K/V submodules are omitted. Caches computed KV states in shared_kv_states post-RoPE for structural layer reuse.
2. bridge.py: Introduces a use_native_generate opt-in flag. This bypasses a current Hugging Face transformers dev-version issue where eager attention causes a KV-cache dimension mismatch during generation. Setting this flag (scoped strictly to this adapter) delegates processing to HF's native generate() utilizing SDPA.
3. main_benchmark.py: Fixes pad_token_id assignment when eos_token_id is a list (Gemma4 uses [1, 106]), taking the first element.

Verification & Performance

All models have been validated.

Fixes #1297

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

Adds a text-only adapter covering both Gemma4ForConditionalGeneration (E2B/E4B/31B/26B-A4B) and Gemma4UnifiedForConditionalGeneration (12B), addressing TransformerLensOrg#1297. Gemma 4 layers are heterogeneous: KV-shared layers drop k/v projections, K==V layers drop v_proj, and per-layer-embedding / MoE submodules appear only on some variants -- all mapped optional and delegated to HF. Unlike Gemma 1-3, Gemma4RMSNorm has no (1+weight) offset. Adds DelegatedAttentionBlockBridge (drops the split-QKV fork aliases, as MLABlockBridge does) so hook-alias resolution stays clean when attention is delegated wholesale to HF. google/gemma-4-E2B-it passes verification (P1 100%, P2 100%, P4 94.7%). - New adapter + four-place registration + gemma4/gemma4_unified model_type mappings - 10 checkpoints added to the model registry - Unit + integration tests (logit parity vs HF on all three structural variants)

jlarson4

Hey @huseyincavusbi glad to finally see this come through. I have a couple comments that exist below, take a look when you have a moment and let me know what you think.

Additionally, @punishell has recently opened #1377, which is a parallel implementation of Gemma4. I'd like to include bits of both your implementations where it makes sense & is relevant. They came up with a very straight forward solution for the KV-cache issue that might be of use to you, if you want to try rebasing your work onto theirs as an extension point. I am thinking there may be a way to use their DelegatedAttentionBlockBridge in combination with your work spent on adding support for Gemma4 to position_embeddings_atttention to provide even better overall support.

There are more moving parts here than anticipated, if you have questions please feel free to ask.

…Bridge expects positional 'vision_features' but Gemma4's Gemma4MultimodalEmbedder.forward() takes 'inputs_embeds' kwarg

…ma4's boi_token is a marker, image_token is the expandable placeholder

…elegation), P4=98.7% from real-model run

…del)

huseyincavusbi · 2026-06-22T14:27:07Z

Hi @jlarson4. Thanks for the review. Rebased onto #1377's DelegatedAttentionBlockBridge as suggested, dropped use_native_generate, and added multimodal vision support. All Gemma4 variants verified on E2B/E4B P1+P4+P7, 26B/31B P1+P4+P7 with --no-hf-reference.

jlarson4 · 2026-06-22T14:33:41Z

Awesome thank you @huseyincavusbi! I will review this today and let you know if I have any additional comments.

jlarson4 · 2026-06-22T16:22:29Z

@huseyincavusbi A couple things we probably want to look into before we merge this:

The verification run is showing degenerate generation in P4 and P7. We are getting loops "expensive and expensive" / "image image image" on simple prompts. Can you run HF-native generation on the same 31B model to see if it produces coherent text? If HF is coherent and the Bridge isn't, that's a real Bridge bug in the delegated path (RoPE / logit-softcapping / attention / generation config). If both loop, it's decoding config. Either we discover a bug that needs fixing, or we add to the verification note that there is a decoding issue.
There is an unrelated bug in verify models that it would be very helpful for you to fix. At line 1119 of verify_models.py): update the guard to torch.backends.mps.is_available() instead of hasattr(torch.mps, "empty_cache"), mirroring the torch.cuda.is_available() branch above. As-is it crashes every CUDA run after scoring, making it look like the run failed.

There are a couple other benchmark-related bugs that I will file separately after this PR is merged, but none that will be blocking.

Thank you for providing the logs for your verification runs they were crucial in putting together this review. Being able to see the 31B model output was invaluable.

EDIT: I should also clarify that I will resolve the merge conflict before I merge the PR once your final changes are in, don't worry about it for the time being.

…tead of hasattr

huseyincavusbi marked this pull request as draft June 14, 2026 10:49

jlarson4 changed the base branch from main to dev June 15, 2026 15:47

jlarson4 reviewed Jun 15, 2026

View reviewed changes

Comment thread tests/unit/model_bridge/test_gemma4.py Outdated

Comment thread transformer_lens/model_bridge/bridge.py Outdated

Comment thread transformer_lens/model_bridge/bridge.py Outdated

huseyincavusbi added 3 commits June 20, 2026 19:49

fix: handle list eos_token_id when setting pad_token_id

3680bba

fix: add Gemma4ForConditionalGeneration to MULTIMODAL_ARCHITECTURES

299bda7

feat: add multimodal vision support to Gemma4 adapter

a6bdd62

huseyincavusbi force-pushed the feat/gemma4-adapters branch from 7eed605 to a6bdd62 Compare June 20, 2026 16:52

huseyincavusbi added 5 commits June 22, 2026 12:43

fix: use GeneralizedComponent for vision projector — VisionProjection…

4c3bad3

…Bridge expects positional 'vision_features' but Gemma4's Gemma4MultimodalEmbedder.forward() takes 'inputs_embeds' kwarg

fix: check image_token before boi_token in multimodal benchmark — Gem…

d469a05

…ma4's boi_token is a marker, image_token is the expandable placeholder

chore: update registry with Gemma4 verification results

7f4b68b

chore: fix E2B-it registry — P1=50% (component benchmark fails with d…

b055509

…elegation), P4=98.7% from real-model run

chore: fix E2B phase2_score — set to null (Phase 2 not run on real mo…

3ff81b3

…del)

huseyincavusbi marked this pull request as ready for review June 22, 2026 14:26

fix: guard MPS synchronize with torch.backends.mps.is_available() ins…

1de51aa

…tead of hasattr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/gemma4 adapters#1385

Feat/gemma4 adapters#1385
huseyincavusbi wants to merge 10 commits into
TransformerLensOrg:devfrom
huseyincavusbi:feat/gemma4-adapters

huseyincavusbi commented Jun 13, 2026

Uh oh!

jlarson4 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

huseyincavusbi commented Jun 22, 2026

Uh oh!

jlarson4 commented Jun 22, 2026

Uh oh!

jlarson4 commented Jun 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

huseyincavusbi commented Jun 13, 2026

Description

Key Implementation Details

Verification & Performance

Type of change

Screenshots

Checklist:

Uh oh!

jlarson4 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

huseyincavusbi commented Jun 22, 2026

Uh oh!

jlarson4 commented Jun 22, 2026

Uh oh!

jlarson4 commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jlarson4 commented Jun 22, 2026 •

edited

Loading