Add GLM-5 DSA TransformerBridge adapter by zeotrix · Pull Request #1429 · TransformerLensOrg/TransformerLens

zeotrix · 2026-06-22T11:53:05Z

Adds TransformerBridge support for `GlmMoeDsaForCausalLM` / GLM-5 DSA.

This adds a GLM-MoE-DSA architecture adapter, including a dedicated DSA attention bridge for MLA-style latent attention plus learned top-k sparse-token selection. The adapter is
registered in the required architecture factory and model-registry surfaces, and tiny-random/glm-5.1 is added and verified in the Bridge registry.

Fixes #1406

Type of change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

Not applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

Validation

Ran:

.venv/bin/pycln --all . --exclude "__init__.py"
.venv/bin/isort --check-only transformer_lens/model_bridge/generalized_components/glm_moe_dsa_attention.py transformer_lens/model_bridge/supported_architectures/glm_moe_dsa.py tests/
unit/model_bridge/supported_architectures/test_glm_moe_dsa_adapter.py tests/integration/model_bridge/test_glm_moe_dsa_adapter.py
.venv/bin/mypy .
.venv/bin/python -m pytest tests/unit/model_bridge/supported_architectures/test_glm_moe_dsa_adapter.py tests/integration/model_bridge/test_glm_moe_dsa_adapter.py tests/unit/tools/
test_model_registry.py -k 'TestGlmMoeDsa or TestGlmMoeDsaBridge or TestRegistrySyncedWithFactory' -q

Result: 11 passed.

Model registry verification:

.venv/bin/python -m transformer_lens.tools.model_registry.verify_models --model tiny-random/glm-5.1

The benchmark reported:

VERIFIED: P1=100.0%, P2=100.0%, P3=85.0%, P4=70.0%

Notes:

- make format and uv run mypy . are blocked locally by a resolver/build issue in the default jupyter dependency group: jupyter-ydoc -> y-py==0.6.2.
- The direct .venv tools above pass.
- verify_models updated supported_models.json and verification_history.json, then exited nonzero during cleanup due to an existing torch.mps.synchronize() call on a machine without an
MPS backend.
- black hung locally even when run only on the touched Python files, so I interrupted it.

Fix broken link in README

@jlarson4

…ensOrg#1316) * Add Direct Logit Attribution tool for TransformerBridge * Resolve review feedback and add Direct Logit Attribution tests Resolved review feedback from @jlarson4, added tests covering reconstruction invariants on a distilgpt2 bridge in compatibility mode, arguments, asserting sum(scores) == logit_diff - (b_U[correct] - b_U[wrong]) against the model's real logits, plus labels/shape and batch-averaging checks. Added additional hardening: - Fix a latent direction-shape bug: replace the fragile answer_tokens.numel()==1 branch with a robust reshape so single-prompt, single-token inputs are handled correctly - Detect hybrid blocks via bridge.layer_types() instead of substring matching named_modules(), the codebase's own semantic mechanism - Import get_act_name from transformer_lens.utilities to avoid the transformer_lens.utils DeprecationWarning; drop the invalid return_type kwarg to run_with_cache - Register the analysis subpackage in tools/__init__.py Closes TransformerLensOrg#1263.

…merLensOrg#1369) * Add Direct Logit Attribution tool (TransformerLensOrg#1263) Add transformer_lens/tools/analysis/direct_logit_attribution.py, a single-call DLA analysis that decomposes a logit (or logit difference) into per-component, per-layer (logit-lens), or per-head contributions. Wraps the existing ActivationCache primitives (decompose_resid / accumulated_resid / stack_head_results / logit_attrs) and works with both HookedTransformer and TransformerBridge, since they share the cache API. Returns a DirectLogitAttribution dataclass (attribution tensor + aligned labels, plus a top(k) helper). Adds integration tests asserting the exact DLA correctness invariant on both systems: the complete decomposition reconstructs the model's real logit up to the unembedding bias b_U. Closes TransformerLensOrg#1263 * Resolving conflicts between 1316 and 1369 * format fixes --------- Co-authored-by: Azra Bano <azrabano23@gmail.com> Co-authored-by: Jonah Larson <jonahalarson@comcast.net>

…enerate (TransformerLensOrg#1374)

…rmerLensOrg#1373)

* Add Phi adapter tests * Add comment about setup component test * Delete redundant config literal tests

* Fixed SVD interpreter test * Format SVD interpreter fixture test

The Restricted Loss section called loss_fn(all_logits, labels), but all_logits had been rearranged earlier into a (p, p, d_vocab) grid for the logit periodicity analysis. loss_fn's 3-D branch assumes (batch, pos, d_vocab) and takes logits[:, -1], producing a (p, p) tensor that crashes the gather against the p*p labels (TransformerLensOrg#543). Use original_logits instead, which is recomputed just above and is the same full-dataset loss the cell intends to print. Also clear the stored RuntimeError output from the cell.

Breaking: removes the public eps_attr constructor argument and the config.eps_attr attribute. The field was never read (its consumer was deleted when NormalizationBridge moved to direct HF delegation), so no model behavior changes, but it is an API removal.

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add Olmo2 architecture adapter tests * Drop test_attn_output_shape per the unit-test guide (shared bridge contract)

* Fixed SVD interpreter test * Format SVD interpreter fixture test * Add qwen adapter unit test * Retrigger CI (unrelated HF 429 error failing the build)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…Org#1384)

…ransformerLensOrg#1390) The adapter already conditionally omitted ln2 from the block submodules when use_parallel_residual=True, but still wrapped them in a plain BlockBridge, which rejects the attn+mlp+no-ln2 shape. Switched to conditional block_cls (ParallelBlockBridge for the parallel branch, BlockBridge for sequential), mirroring the dual-mode pattern in falcon.py.

26 tests covering: component mapping (slots, bridge types, HF paths, submodule structure), anti-drift config flags (final_rms, uses_rms_norm, gated_mlp), weight conversion key set and rearrange patterns, GQA propagation to K/V only, and setup_component_testing rotary-emb wiring. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: Fix broken line graphs - Fixes incorrect induction loss graph at end of notebook (values were incorrect, loss appeared to be going up with training instead of down!) - Fixes x/y axes showing "index"/"value" instead of configured names * fix: broken/outdated links * doc: Reference ARENA chapter directly instead of an older equivalent that forwards there * chore: Align text in code with text in markdown * doc: Update texts on supported models and architectures - Updated counts - Updated link to list of all models supported on v3 - "HookedTransformed.from_pretrained" instances -> "TransformerBridge.boot_transformers", the up-to-date recommended method which makes the wider variety accessible - "consistent(-ish) architecture" -> "consistent(-ish) interface": with v3, the consistent interface proxies the non-consistent underlying architectures - "Transformer architecture" title -> "HookedTransformer architecture", mention deprecation - Mention overview of models last updated in 2023 - Drop stale reference to phantom "table for hyper-parameters" * doc: Add note on boot_transformers incompatibility with checkpoints * doc: Document purpose of enabling compatibility mode * doc: Drop minor comment on default_prepend_bos, only relevant for legacy HookedTransformer * doc: Remove stale gotcha, hooks are now properly removed even in case of an error in a hook * fix: Mismatching values in code vs. descriptions * chore: Replace deprecated circuitvis attention visualization with newer one * chore: Replace deprecated model names aliases * chore: Remove deprecated prepend_bos argument * chore: Update hook name to v3 * chore: Full OV graph title no longer the same as the preceding very similar OV graph * doc: Clearer spaces in tokens * doc: Bracket tokens styled as code to prevent Colab from collapsing spaces Previously Colab showed both tokens as identical, squashing the space * chore: Try the model out right after calls to do so * doc: approximate number can be approximate (consistent with another preceding one) * chore: Remove installing old node version which was the newest at the time Now actually gives a deprecation warning and waits for 10 seconds * chore: Remove leftover cell * chore: fix bad closing tag * doc: fix typos * doc: More precise wording, setting a value, not adding it * doc: move floating function names into complete sentence

25 tests across 4 classes covering component mapping, config flags, weight conversions, and GQA head-count propagation. - TestMistralComponentMapping (12 tests): top-level keys, bridge types, HF module paths, block submodules, attn flags, QKVO paths, MLP paths. Includes explicit guard that attn uses AttentionBridge, not PositionEmbeddingsAttentionBridge. - TestMistralAdapterConfig (4 tests): final_rms=False, uses_rms_norm, gated_mlp, attn_only — anti-drift flags. - TestMistralWeightConversions (5 tests): exactly 4 QKVO weight keys, split-heads and merge-heads rearrange patterns, no bias/norm entries. - TestMistralGQASupport (4 tests): K/V use n_key_value_heads, Q/O unchanged, fallback to n_heads when n_key_value_heads is unset.

…properly (TransformerLensOrg#1394)

…formerLensOrg#1397)

…ansformerLensOrg#1399) Adds focused test suites for three architecture adapters per the proposal in issue TransformerLensOrg#1302. tests/unit/model_bridge/supported_architectures/test_phi3_adapter.py - Component mapping structure (bridge types and HF module paths) - Weight conversion key set and source keys for fused qkv/gate_up - _SizedSplitConversion numerical correctness (Q/K/V GQA splits) - Config flags (RMS norm, rotary, gated MLP, supports_fold_ln=False) - preprocess_weights LN folding into QKV and gate/up projections tests/unit/model_bridge/supported_architectures/test_granite_adapter.py - Component mapping for dense GraniteArchitectureAdapter - Weight conversion key set (standard QKVO rearrangements) - Config flags (final_rms=True, default_prepend_bos=False, GQA heads) - GraniteMoeArchitectureAdapter: MoE bridge replaces dense MLP, all other components and config flags match dense Granite Closes part of TransformerLensOrg#1302.

* chore: Plot helper allows customizing graph before showing it * feat: Direct path patching in exploratory analysis demo, resolves TransformerLensOrg#111 * doc: fix head index in prose

danra and others added 28 commits June 8, 2026 09:14

Merge pull request TransformerLensOrg#1370 from danra/patch-1

9deb6bf

Fix broken link in README

Merge remote-tracking branch 'origin/main' into dev

75095b1

Add stop_strings and stopping_criteria support to TransformerBridge.g…

a5f1193

…enerate (TransformerLensOrg#1374)

Remove extra checks from Phi adapter setup_component_testing (Transfo…

d37642d

…rmerLensOrg#1373)

Add phi tests (TransformerLensOrg#1372)

35ab438

* Add Phi adapter tests * Add comment about setup component test * Delete redundant config literal tests

Fixed SVD interpreter test (TransformerLensOrg#1375)

34dc38a

* Fixed SVD interpreter test * Format SVD interpreter fixture test

Fix typos and narrow a bare except (TransformerLensOrg#1380)

036a861

Add unit tests for NeoArchitectureAdapter (TransformerLensOrg#1381)

8c395ee

Add unit tests for NeoxArchitectureAdapter (TransformerLensOrg#1382)

d6896df

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Add Olmo2 architecture adapter tests (TransformerLensOrg#1387)

e49f78c

* Add Olmo2 architecture adapter tests * Drop test_attn_output_shape per the unit-test guide (shared bridge contract)

Add Qwen Adapter unit tests (TransformerLensOrg#1388)

d603a1d

* Fixed SVD interpreter test * Format SVD interpreter fixture test * Add qwen adapter unit test * Retrigger CI (unrelated HF 429 error failing the build)

Add unit tests for OpenElmArchitectureAdapter (TransformerLensOrg#1383)

5962cd2

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Add unit tests for LlavaOnevisionArchitectureAdapter (TransformerLens…

b682377

…Org#1384)

Add StableLM architecture adapter tests (TransformerLensOrg#1393)

09f2eba

Remove torch cap, so that newer versions of python can still resolve …

56c3d91

…properly (TransformerLensOrg#1394)

Updating Agentic Workflows (TransformerLensOrg#1395)

6a17449

Drop round-trip and output-shape tests per the unit-test guide (Trans…

8691fa3

…formerLensOrg#1397)

Direct path patch demo (TransformerLensOrg#1398)

cae9d46

* chore: Plot helper allows customizing graph before showing it * feat: Direct path patching in exploratory analysis demo, resolves TransformerLensOrg#111 * doc: fix head index in prose

Add GLM-5 DSA TransformerBridge adapter

f9733a2

zeotrix closed this Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GLM-5 DSA TransformerBridge adapter#1429

Add GLM-5 DSA TransformerBridge adapter#1429
zeotrix wants to merge 28 commits into
TransformerLensOrg:mainfrom
zeotrix:fix/issue-1406

zeotrix commented Jun 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Conversation

zeotrix commented Jun 22, 2026

Adds TransformerBridge support for GlmMoeDsaForCausalLM / GLM-5 DSA.

Type of change

Screenshots

Checklist:

Validation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Adds TransformerBridge support for `GlmMoeDsaForCausalLM` / GLM-5 DSA.