[feat] Add GGUF conversion and inference support for BitNet embedding 270m (Gemma3) by isHuangXin · Pull Request #562 · microsoft/BitNet

isHuangXin · 2026-05-21T06:58:14Z

Add convert-bitnet-embedding-270m-to-gguf.py for Gemma3-based 270m models
Support f32, f16, and I2_S ternary quantization output types
Add AVX512BW SIMD paths for I2_S dot product in ggml-bitnet-mad.cpp
Add immintrin.h include and bitnet-lut-kernels.h guard in ggml-bitnet-lut.cpp
Add documentation for Gemma3 GGUF conversion implementation
Update llama.cpp submodule with Gemma3 architecture support

…nversion - Add GGUF conversion tool for bitnet-embeddings-0.6b (safetensors -> F16/I2_S GGUF) - Add Qwen3 architecture support in llama.cpp submodule with per-projection RMSNorm - Add I2_S ternary quantization (2-bit packed -1/0/+1) for lossless precision - Add f16 norm weight support for correct embedding inference - Add AVX512BW SIMD paths for I2_S kernel (~2x throughput on AVX512-capable CPUs) - Guard bitnet-lut-kernels.h include with TL1/TL2 preprocessor checks - Update llama.cpp submodule to dev-bitnet-embedding-0.6b branch - Document F16 (from multilingual-e5-0.6b) and I2_S (from bitnet-embeddings-0.6b) conversion process

isHuangXin · 2026-05-21T06:58:56Z

@microsoft-github-policy-service agree

… 270m (Gemma3) - Add convert-bitnet-embedding-270m-to-gguf.py for Gemma3-based 270m models - Support f32, f16, and I2_S ternary quantization output types - Add AVX512BW SIMD paths for I2_S dot product in ggml-bitnet-mad.cpp - Add immintrin.h include and bitnet-lut-kernels.h guard in ggml-bitnet-lut.cpp - Add documentation for Gemma3 GGUF conversion implementation - Update llama.cpp submodule with Gemma3 architecture support

…edding models - Merge convert-bitnet-embedding-270m-to-gguf.py into convert-bitnet-embedding-to-gguf.py with auto-detection of model architecture (qwen3/gemma3_text) from config.json - Merge separate Qwen3 and Gemma3 conversion docs into a single bitnet-embeddings-gguf-conversion.md - Remove redundant per-architecture scripts and docs

isHuangXin changed the title ~~[feat] Add GGUF conversion and inference support for BitNet embedding…~~ [feat] Add GGUF conversion and inference support for BitNet embedding 270m (Gemma3) May 21, 2026

isHuangXin added 2 commits May 24, 2026 17:10

isHuangXin force-pushed the dev-bitnet-embedding-270m branch from 86e63a1 to 5720fc7 Compare May 24, 2026 10:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] Add GGUF conversion and inference support for BitNet embedding 270m (Gemma3)#562

[feat] Add GGUF conversion and inference support for BitNet embedding 270m (Gemma3)#562
isHuangXin wants to merge 3 commits into
microsoft:mainfrom
isHuangXin:dev-bitnet-embedding-270m

isHuangXin commented May 21, 2026 •

edited

Loading

Uh oh!

isHuangXin commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

isHuangXin commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

isHuangXin commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

isHuangXin commented May 21, 2026 •

edited

Loading