[feat] Add GGUF conversion and inference support for BitNet embedding 270m (Gemma3)#562
Open
isHuangXin wants to merge 3 commits into
Open
[feat] Add GGUF conversion and inference support for BitNet embedding 270m (Gemma3)#562isHuangXin wants to merge 3 commits into
isHuangXin wants to merge 3 commits into
Conversation
…nversion - Add GGUF conversion tool for bitnet-embeddings-0.6b (safetensors -> F16/I2_S GGUF) - Add Qwen3 architecture support in llama.cpp submodule with per-projection RMSNorm - Add I2_S ternary quantization (2-bit packed -1/0/+1) for lossless precision - Add f16 norm weight support for correct embedding inference - Add AVX512BW SIMD paths for I2_S kernel (~2x throughput on AVX512-capable CPUs) - Guard bitnet-lut-kernels.h include with TL1/TL2 preprocessor checks - Update llama.cpp submodule to dev-bitnet-embedding-0.6b branch - Document F16 (from multilingual-e5-0.6b) and I2_S (from bitnet-embeddings-0.6b) conversion process
Author
|
@microsoft-github-policy-service agree |
… 270m (Gemma3) - Add convert-bitnet-embedding-270m-to-gguf.py for Gemma3-based 270m models - Support f32, f16, and I2_S ternary quantization output types - Add AVX512BW SIMD paths for I2_S dot product in ggml-bitnet-mad.cpp - Add immintrin.h include and bitnet-lut-kernels.h guard in ggml-bitnet-lut.cpp - Add documentation for Gemma3 GGUF conversion implementation - Update llama.cpp submodule with Gemma3 architecture support
…edding models - Merge convert-bitnet-embedding-270m-to-gguf.py into convert-bitnet-embedding-to-gguf.py with auto-detection of model architecture (qwen3/gemma3_text) from config.json - Merge separate Qwen3 and Gemma3 conversion docs into a single bitnet-embeddings-gguf-conversion.md - Remove redundant per-architecture scripts and docs
86e63a1 to
5720fc7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.