[None][fix] Skip calibration scalars in initialize_dummy_weights#13879
[None][fix] Skip calibration scalars in initialize_dummy_weights#13879shikicloud wants to merge 2 commits intoNVIDIA:mainfrom
Conversation
📝 WalkthroughWalkthroughThe PR modifies ChangesFP8 Calibration Parameter Filtering
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Comment |
|
/bot run |
64df161 to
5696c3a
Compare
|
/bot help |
GitHub Bot Help
Provide a user friendly way for developers to interact with a Jenkins server. Run See details below for each supported subcommand. Details
Launch build/test pipelines. All previously running jobs will be killed.
kill
Kill all running builds associated with pull request. skip
Skip testing for latest commit on pull request. reuse-pipeline
Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break. |
|
/bot run |
|
/ok-to-test |
|
/bot run |
1 similar comment
|
/bot run |
Signed-off-by: Shiki Wu <shikiw@nvidia.com>
00c90d1 to
f25063c
Compare
…h_quant_config Signed-off-by: Shiki Wu <shikiw@nvidia.com>
f25063c to
c110c51
Compare
|
/bot run --disable-fail-fast |
|
PR_Github #47369 [ run ] triggered by Bot. Commit: |
Description
initialize_dummy_weightsrandomizes every floating-point Parameter instate_dict(), which incorrectly includes 0-D calibration scalars (input_scale,kv_scales,alpha, etc.) thatcreate_weightsinitializes to identity. The seed is reseeded per-Parameter, so every 0-D scalar gets the same value (-2e-4 with seed=0).For HF FineGrainedFP8 +
load_format="dummy"+ IPCupdate_weights, no calibratedinput_scaleis shipped to overwrite the random value, and attention's FP8-output-quantization path reads it as a scaling factor, saturating FP8 output to noise.Fix: skip these calibration scalars in the dummy randomization loop.
Test Coverage
test_llm_update_weights.py::test_llm_update_weights_with_quant_config(8B-FP8 & 30B-A3B-FP8).Before: top-20 ≈14% / ≈47%. After: ≈94% / ≈95%.
Summary by CodeRabbit