[None][fix] Skip calibration scalars in initialize_dummy_weights by shikicloud · Pull Request #13879 · NVIDIA/TensorRT-LLM

shikicloud · 2026-05-08T03:34:43Z

Description

initialize_dummy_weights randomizes every floating-point Parameter in state_dict(), which incorrectly includes 0-D calibration scalars (input_scale, kv_scales, alpha, etc.) that create_weights initializes to identity. The seed is reseeded per-Parameter, so every 0-D scalar gets the same value (-2e-4 with seed=0).

For HF FineGrainedFP8 + load_format="dummy" + IPC update_weights, no calibrated input_scale is shipped to overwrite the random value, and attention's FP8-output-quantization path reads it as a scaling factor, saturating FP8 output to noise.

Fix: skip these calibration scalars in the dummy randomization loop.

Test Coverage

test_llm_update_weights.py::test_llm_update_weights_with_quant_config (8B-FP8 & 30B-A3B-FP8).

Before: top-20 ≈14% / ≈47%. After: ≈94% / ≈95%.

Summary by CodeRabbit

Bug Fixes
- Fixed FP8 attention accuracy by preventing unintended randomization of calibration and scaling parameters during dummy weight initialization when calibrated values are unavailable from checkpoints.
- Improved KV cache scaling preservation during model loading.

coderabbitai · 2026-05-08T03:36:41Z

📝 Walkthrough

Walkthrough

The PR modifies initialize_dummy_weights() to selectively skip random initialization for FP8 calibration and KV cache scaling parameters. Parameters with specific suffix patterns (input scales, KV scales, and attention alphas) are now preserved instead of being overwritten with random dummy values.

Changes

FP8 Calibration Parameter Filtering

Layer / File(s)	Summary
Parameter Name-Based Filtering `tensorrt_llm/_torch/pyexecutor/model_loader.py`	`initialize_dummy_weights()` iterates over named state_dict entries and skips dummy initialization for parameters ending with FP8 calibration suffixes (input_scale, inv_input_scale, kv_scale, inv_kv_scale, alpha_scale, scalar_alpha).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	The description covers the bug explanation, root cause, fix, and test coverage with quantitative results, but is missing a PR Checklist section.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title clearly summarizes the main change: skipping calibration scalars in the initialize_dummy_weights function, which is the core fix described in the PR objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

shikicloud · 2026-05-08T03:37:54Z

/bot run

shikicloud · 2026-05-08T03:58:34Z

/bot help

github-actions · 2026-05-08T03:58:44Z

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental) --high-priority]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Supports wildcard * for pattern matching (e.g., "*PerfSanity*" matches all stages containing PerfSanity). Examples: "A10-PyTorch-1, xxx", "PerfSanity". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Supports wildcard * for pattern matching. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx", --extra-stage "Post-Merge".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

--high-priority (OPTIONAL) : Run the pipeline with high priority. This option is restricted to authorized users only and will route the job to a high-priority queue.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

shikicloud · 2026-05-08T03:59:05Z

/bot run

shuyixiong · 2026-05-08T04:02:51Z

/ok-to-test

shikicloud · 2026-05-08T04:04:14Z

/bot run

shikicloud · 2026-05-08T04:37:16Z

/bot run

Signed-off-by: Shiki Wu <shikiw@nvidia.com>

…h_quant_config Signed-off-by: Shiki Wu <shikiw@nvidia.com>

shuyixiong · 2026-05-08T09:11:08Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-05-08T09:17:31Z

PR_Github #47369 [ run ] triggered by Bot. Commit: c110c51 Link to invocation

shikicloud requested a review from a team as a code owner May 8, 2026 03:34

shikicloud requested a review from byshiue May 8, 2026 03:34

github-actions Bot assigned shikicloud May 8, 2026

shikicloud changed the title ~~[None][fix] Skip calibration scalars in initialize_dummy_weights~~ [None][fix] Skip calibration scalars in initialize_dummy_weights for dynamic kv compression May 8, 2026

shikicloud force-pushed the fix/skip-calibration-scalars-in-dummy-init branch from 64df161 to 5696c3a Compare May 8, 2026 03:43

shikicloud changed the title ~~[None][fix] Skip calibration scalars in initialize_dummy_weights for dynamic kv compression~~ [None][fix] Skip calibration scalars in initialize_dummy_weights May 8, 2026

Funatiq approved these changes May 8, 2026

View reviewed changes

[None][fix] Skip calibration scalars in initialize_dummy_weights

36cd5af

Signed-off-by: Shiki Wu <shikiw@nvidia.com>

shikicloud force-pushed the fix/skip-calibration-scalars-in-dummy-init branch from 00c90d1 to f25063c Compare May 8, 2026 08:45

[None][test] Add FP8 KV cache coverage in test_llm_update_weights_wit…

c110c51

…h_quant_config Signed-off-by: Shiki Wu <shikiw@nvidia.com>

shikicloud force-pushed the fix/skip-calibration-scalars-in-dummy-init branch from f25063c to c110c51 Compare May 8, 2026 08:50

Conversation

shikicloud commented May 8, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

shikicloud commented May 8, 2026

Uh oh!

shikicloud commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

GitHub Bot Help

kill

skip

reuse-pipeline

Uh oh!

shikicloud commented May 8, 2026

Uh oh!

shuyixiong commented May 8, 2026

Uh oh!

shikicloud commented May 8, 2026

Uh oh!

shikicloud commented May 8, 2026

Uh oh!

shuyixiong commented May 8, 2026

Uh oh!

tensorrt-cicd commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shikicloud commented May 8, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading