Skip to content

fixi(nvlink): safe handling of legacy/missing chassis_serial field in…#2871

Open
rtamma-nv wants to merge 1 commit into
NVIDIA:mainfrom
rtamma-nv:fix/nvlink-chassis-serial
Open

fixi(nvlink): safe handling of legacy/missing chassis_serial field in…#2871
rtamma-nv wants to merge 1 commit into
NVIDIA:mainfrom
rtamma-nv:fix/nvlink-chassis-serial

Conversation

@rtamma-nv

Copy link
Copy Markdown
Contributor

… persisted nvlink_info

Make sure that nvlink_info can be safely deserialized even if chassis_serial field in persisted json blob is missing or empty. Also, make partition monitor backfill chassis_serial in nvlink_info when the field is empty

Related issues

#2857

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • [X ] Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • [ X] Unit tests added/updated
  • Integration tests added/updated
  • [ X] Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

… persisted nvlink_info

Signed-off-by: Roopesh Tamma <rtamma@nvidia.com>
@rtamma-nv rtamma-nv requested a review from a team as a code owner June 25, 2026 02:01
@rtamma-nv rtamma-nv requested a review from tmcroberts97 June 25, 2026 02:01
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 71ea89f2-4a1a-4a0d-b7e0-c33d09a75038

📥 Commits

Reviewing files that changed from the base of the PR and between e7781e0 and 71aec46.

📒 Files selected for processing (2)
  • crates/api-model/src/hardware_info.rs
  • crates/nvlink-manager/src/lib.rs

Summary by CodeRabbit

  • Bug Fixes
    • Improved compatibility with older hardware data by handling missing chassis serial values during import.
    • Existing hardware records are now refreshed when key identifying fields are missing or blank, helping ensure data stays up to date.
    • Added coverage for legacy hardware payloads to confirm the expected fallback behavior.

Walkthrough

The PR makes legacy nvlink_info rows deserialize without chassis_serial and updates nvlink-manager to repopulate existing records when chassis_serial is blank or domain_uuid is nil.

Changes

Legacy NVLink compatibility

Layer / File(s) Summary
Deserialize legacy NVLink info
crates/api-model/src/hardware_info.rs
MachineNvLinkInfo.chassis_serial now defaults during deserialization, and a regression test covers legacy nvlink_info JSON without chassis_serial plus legacy GPU naming.
Refresh blank chassis serial
crates/nvlink-manager/src/lib.rs
populate_machine_nvlink_info_if_needed now refreshes existing records when chassis_serial is blank or domain_uuid is nil, and its doc comment matches that behavior.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title is specific and aligned with the main change: legacy or missing chassis_serial handling in nvlink.
Description check ✅ Passed The description matches the changeset by covering safe deserialization and chassis_serial backfill behavior.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 265 6 24 98 7 130
machine-validation-runner 717 32 188 267 36 194
machine_validation 717 32 188 267 36 194
machine_validation-aarch64 717 32 188 267 36 194
nvmetal-carbide 717 32 188 267 36 194
TOTAL 3139 134 776 1172 151 906

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

Comment thread crates/api-model/src/hardware_info.rs
@rtamma-nv rtamma-nv enabled auto-merge (squash) June 25, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants