Skip to content

docs(quick-start): add site controller node DPU provisioning requirement#2859

Open
shayan1995 wants to merge 2 commits into
NVIDIA:mainfrom
shayan1995:snamaghi/doc-site-controller-dpu-prereq
Open

docs(quick-start): add site controller node DPU provisioning requirement#2859
shayan1995 wants to merge 2 commits into
NVIDIA:mainfrom
shayan1995:snamaghi/doc-site-controller-dpu-prereq

Conversation

@shayan1995

Copy link
Copy Markdown
Contributor

If site controller nodes have DPUs, they must be flashed and configured
before K8s cluster setup. NICo does not provision its own nodes' DPUs.
Adds pointer to NVIDIA DOCA BlueField Firmware Bundle download archive.

Related issues

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

The quick-start guide Step 2 ("Prepare the Kubernetes Cluster") had no mention of what to do if site controller nodes are equipped with DPUs. This is a silent failure mode: without pre-provisioned DPUs, the Kubernetes cluster and NICo networking topology may not come up correctly, but the guide gave operators no indication that DPU setup was required before cluster bootstrap.

Adds a "Site controller node DPU requirements" subsection that clarifies:

  • NICo manages DPUs on downstream bare-metal hosts after ingestion — it does not provision the site controller nodes' own DPUs.
  • If site controller nodes have DPUs, firmware flashing, operating mode configuration, and ARM OS reachability must all be completed before Kubernetes cluster setup.
  • Points operators to the NVIDIA DOCA BlueField Firmware Bundle download archive for firmware flashing instructions and supported firmware versions.

@shayan1995 shayan1995 requested a review from a team as a code owner June 24, 2026 20:48
@shayan1995 shayan1995 requested a review from Coco-Ben June 24, 2026 20:48
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

The quick-start guide adds DPU provisioning prerequisites for site controller nodes before Kubernetes setup. The NICo container build guide adds instructions to verify that make images-all produces 14 deployable images and explains how to interpret missing or intermediate images.

Changes

Site controller DPU prerequisites

Layer / File(s) Summary
Pre-Kubernetes DPU requirements
docs/getting-started/quick-start.md
Adds a Step 2 subsection for DPU-equipped site controller nodes with firmware, operating mode, and ARM OS reachability prerequisites, plus a firmware reference link.

NICo build verification

Layer / File(s) Summary
Build output verification
docs/manuals/building_nico_containers.md
Adds a verification section after make images-all with a 14-image table, a count command, and notes for missing or intermediate images.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Possibly related PRs

Suggested labels

medium risk

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description check ✅ Passed The description accurately matches the docs update and its DPU prerequisite guidance.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately summarizes the main quick-start doc change about DPU provisioning requirements for site controller nodes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@github-actions

Copy link
Copy Markdown

🔍 Container Scan Summary

No Grype artifacts were found to aggregate.

@github-actions

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/getting-started/quick-start.md`:
- Around line 44-46: The quick-start markdown text references both the NVIDIA
DOCA documentation and the BlueField Firmware Bundle archive, but only the
archive link is present. Update the paragraph near the referenced sentence by
either adding the missing DOCA documentation link alongside the existing archive
link or rewriting the sentence to mention only the archive; use the surrounding
quick-start section content to keep the wording accurate and the links working.
- Line 42: The quick-start text is ambiguous about which DPU access path to use,
so update the sentence to name the DPU OOB/SSH endpoint explicitly instead of
referring generically to a management interface. Keep the wording aligned with
the surrounding docs terminology in quick-start.md, and make sure readers can
clearly distinguish the DPU OOB/RJ45 SSH path from the BMC/Redfish path.

In `@docs/manuals/building_nico_containers.md`:
- Around line 71-75: The verification snippet after make images-all is not
self-contained because it relies on IMAGE_REGISTRY and IMAGE_TAG being
pre-exported, which can make the docker images filter check the wrong reference
or an empty one. Update the documentation in the building_nico_containers guide
so the verification command derives or sets the same default values used by the
Makefile before running docker images, and apply the same fix to the later
related snippet. Keep the example copy-paste safe and ensure the image count
check works without requiring prior shell setup.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d8964236-7ba2-4fb0-bbbe-fd0a54f7f446

📥 Commits

Reviewing files that changed from the base of the PR and between ecf44bd and 3f80dfe.

📒 Files selected for processing (2)
  • docs/getting-started/quick-start.md
  • docs/manuals/building_nico_containers.md


- Flash the DPU firmware to a supported version using the BlueField Firmware Bundle.
- Configure the DPU operating mode (DPU mode or NIC mode) to match your site controller networking topology. See the [network prerequisites](prerequisites/network.md) for the supported topologies.
- Ensure the DPU ARM OS is booted and reachable via its management interface.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Name the DPU endpoint explicitly.

“Management interface” is ambiguous here; the existing docs distinguish the DPU OOB/RJ45 SSH path from the separate BMC/Redfish path, so readers may validate the wrong interface. Please call out the DPU OOB/SSH endpoint directly.

As per path instructions, review Markdown for correctness and clarity.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/getting-started/quick-start.md` at line 42, The quick-start text is
ambiguous about which DPU access path to use, so update the sentence to name the
DPU OOB/SSH endpoint explicitly instead of referring generically to a management
interface. Keep the wording aligned with the surrounding docs terminology in
quick-start.md, and make sure readers can clearly distinguish the DPU OOB/RJ45
SSH path from the BMC/Redfish path.

Source: Path instructions

Comment on lines +44 to +46
Refer to the NVIDIA DOCA documentation and the BlueField Firmware Bundle download archive for firmware flashing instructions and supported firmware versions:

[https://developer.nvidia.com/doca-2-9-2-lts-ovs-doca-download-archive?deployment_platform=BlueField&deployment_package=BF-FW-Bundle](https://developer.nvidia.com/doca-2-9-2-lts-ovs-doca-download-archive?deployment_platform=BlueField&deployment_package=BF-FW-Bundle)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Add the missing DOCA link or rewrite this sentence.

This paragraph says “NVIDIA DOCA documentation and the BlueField Firmware Bundle download archive,” but only the archive URL is present. Either add the actual DOCA docs link or remove that claim.

As per path instructions, review Markdown for correctness and working links.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/getting-started/quick-start.md` around lines 44 - 46, The quick-start
markdown text references both the NVIDIA DOCA documentation and the BlueField
Firmware Bundle archive, but only the archive link is present. Update the
paragraph near the referenced sentence by either adding the missing DOCA
documentation link alongside the existing archive link or rewriting the sentence
to mention only the archive; use the surrounding quick-start section content to
keep the wording accurate and the links working.

Source: Path instructions

Comment on lines +71 to +75
After `make images-all` completes, verify that all 14 deployable images were produced:

```sh
docker images --filter "reference=${IMAGE_REGISTRY}/*:${IMAGE_TAG}" \
--format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

Make the verification commands self-contained.

make images-all uses the Makefile defaults when IMAGE_REGISTRY/IMAGE_TAG are unset, but these shell snippets only work if the reader has already exported matching values. As written, the copy-paste check can silently filter on an empty reference and report the wrong count. As per path instructions, review Markdown for correctness, clarity, spelling, grammar, working links, and whether commands/examples are realistic and safe.

♻️ Suggested fix
-docker images --filter "reference=${IMAGE_REGISTRY}/*:${IMAGE_TAG}" \
+docker images --filter "reference=${IMAGE_REGISTRY:-localhost:5000}/*:${IMAGE_TAG:-latest}" \
   --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}"
@@
-docker images --filter "reference=${IMAGE_REGISTRY}/*:${IMAGE_TAG}" \
+docker images --filter "reference=${IMAGE_REGISTRY:-localhost:5000}/*:${IMAGE_TAG:-latest}" \
   --format "{{.Repository}}" | wc -l

Also applies to: 97-100

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@docs/manuals/building_nico_containers.md` around lines 71 - 75, The
verification snippet after make images-all is not self-contained because it
relies on IMAGE_REGISTRY and IMAGE_TAG being pre-exported, which can make the
docker images filter check the wrong reference or an empty one. Update the
documentation in the building_nico_containers guide so the verification command
derives or sets the same default values used by the Makefile before running
docker images, and apply the same fix to the later related snippet. Keep the
example copy-paste safe and ensure the image count check works without requiring
prior shell setup.

Source: Path instructions

Documents the expected 14 deployable images, a quick wc -l count check,
what to infer from a lower count, and which 3 intermediates are local-only.
If site controller nodes have DPUs, they must be flashed and configured
before K8s cluster setup. NICo does not provision its own nodes' DPUs.
Adds pointer to NVIDIA DOCA BlueField Firmware Bundle download archive.
@shayan1995 shayan1995 force-pushed the snamaghi/doc-site-controller-dpu-prereq branch from 3f80dfe to 0846595 Compare June 24, 2026 22:15
@shayan1995 shayan1995 changed the title Snamaghi/doc site controller dpu prereq docs(quick-start): add site controller node DPU provisioning requirement Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants