fix(admin-cli): cap paged chunk size to the server's max_find_by_ids by chet · Pull Request #2874 · NVIDIA/infra-controller

chet · 2026-06-25T02:21:20Z

The admin-cli's paged get_all_* wrappers chunk ids by the CLI's page_size, but the server's *ByIds RPCs reject any request whose id count exceeds runtime_config.max_find_by_ids. Since the two limits are configured independently, a deployment with page_size above the server cap would fail every paged list command with InvalidArgument. This routes all the wrappers through one cap, so a page never exceeds what the server accepts.

A new ApiClient::effective_chunk_size reads max_find_by_ids from the RuntimeConfig the version RPC already exposes and returns min(page_size, cap); a zero/unset cap falls back to page_size, since chunking by zero would panic.
All 21 paged .chunks(page_size) sites (machines, instances, racks, switches, power-shelves, segments, VPCs, DPAs, partitions, keysets, NSGs, explored hosts/devices, ...) now chunk by the capped size.
The cap arithmetic is a pure cap_chunk_size helper with a unit test.

Surfaced by CodeRabbit on PR #2833.

Tests added!

This supports #2872

The admin-cli's paged `get_all_*` wrappers chunk ids by the CLI's `page_size`, but the server's `*ByIds` RPCs reject any request whose id count exceeds `runtime_config.max_find_by_ids`. Since the two limits are configured independently, a deployment with `page_size` above the server cap would fail every paged list command with `InvalidArgument`. This routes all the wrappers through one cap, so a page never exceeds what the server accepts. - A new `ApiClient::effective_chunk_size` reads `max_find_by_ids` from the `RuntimeConfig` the `version` RPC already exposes and returns `min(page_size, cap)`; a zero/unset cap falls back to `page_size`, since chunking by zero would panic. - All 21 paged `.chunks(page_size)` sites (machines, instances, racks, switches, power-shelves, segments, VPCs, DPAs, partitions, keysets, NSGs, explored hosts/devices, ...) now chunk by the capped size. - The cap arithmetic is a pure `cap_chunk_size` helper with a unit test. Surfaced by CodeRabbit on PR NVIDIA#2833. Tests added! This supports NVIDIA#2872 Signed-off-by: Chet Nichols III <chetn@nvidia.com>

chet · 2026-06-25T02:21:36Z

@coderabbitai full_review please, thanks!

coderabbitai · 2026-06-25T02:21:45Z

Summary by CodeRabbit

Bug Fixes
- Improved list loading across multiple admin views so pagination respects server-side request limits.
- Prevented failures when the server reports no applicable limit, keeping large result sets working reliably.
- Made remediation listings more stable and consistent when retrieving results across multiple pages.

Walkthrough

ApiClient now reads the server max_find_by_ids runtime setting before chunking ID-based list requests. Several list retrieval paths use the capped chunk size, and remediation pagination now folds chunked fetches into one result.

Changes

Server-cap-aware pagination

Layer / File(s)	Summary
Cap helper and runtime lookup `crates/admin-cli/src/rpc.rs`	Adds `cap_chunk_size`, derives an effective chunk size from `version(true)`, and covers capped and uncapped cases in unit tests.
General list pagination `crates/admin-cli/src/rpc.rs`	Updates the main `get_all_*` ID-chunking paths to use the server-cap-aware chunk size before fetching each page.
Specialized pagination `crates/admin-cli/src/rpc.rs`	Applies the same capped chunk sizing to site exploration, explored host/device, extension service, and DPF state retrieval.
Remediation pagination `crates/admin-cli/src/rpc.rs`	Refactors `get_all_remediations` to chunk with the effective size, fetch each chunk, and accumulate the results with mapped API errors.

Sequence Diagram(s)

sequenceDiagram
  participant ApiClient
  participant API_server
  participant get_all_remediations
  ApiClient->>API_server: version(true)
  API_server-->>ApiClient: runtime_config.max_find_by_ids
  ApiClient->>get_all_remediations: chunk IDs with effective_chunk_size(page_size)
  get_all_remediations-->>ApiClient: accumulated results

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Cap CLI page-size chunking to the server's max_find_by_ids across paged commands #2872 — This PR implements the same server-cap-aware paging behavior via effective_chunk_size and max_find_by_ids.

Possibly related PRs

NVIDIA/infra-controller#2824 — Both PRs modify the same crates/admin-cli/src/rpc.rs pagination paths for multiple get_all_* methods.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 74.07% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and accurately summarizes the main change: capping paged chunk sizes to the server limit.
Description check	✅ Passed	The description is directly related to the change and explains the chunk-size capping fix and its motivation.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai · 2026-06-25T02:21:51Z

@chet Understood — I’ll redo the full review of #2874 with attention to correctness, edge cases, and maintainability across the chunk-size capping changes and accompanying tests.

🐇

✅ Action performed

Full review finished.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/admin-cli/src/rpc.rs`:
- Around line 107-115: The central chunk-size path in effective_chunk_size
currently allows a zero page_size to flow through cap_chunk_size, which can
later panic when used with .chunks(...). Update effective_chunk_size in rpc.rs
to reject page_size == 0 up front (or normalize it to a minimum of 1) before
applying the cap from version(true) and runtime_config.max_find_by_ids, so all
callers of this helper are guaranteed a safe nonzero chunk size.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f905e79a-7a08-45bb-a8ca-4a087753cb77

📥 Commits

Reviewing files that changed from the base of the PR and between e7781e0 and a631c1c.

📒 Files selected for processing (1)

crates/admin-cli/src/rpc.rs

coderabbitai

🧹 Nitpick comments (2)

crates/admin-cli/src/rpc.rs (2)

2428-2443: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Keep remediation page fetches buffered.

This chain uses .then(...), so remediation chunks are fetched serially, unlike the surrounding paged helpers that use .buffered(PAGED_LIST_FETCH_CONCURRENCY). Switching to map(...).buffered(...).try_fold(...) preserves the established concurrency for large result sets.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/rpc.rs` around lines 2428 - 2443, The remediation paging
logic in the `remediations` stream is fetching chunks serially because it uses
`then(...)` inside the `stream::iter` chain. Update the `remediations` pipeline
to match the other paged helpers by using `map(...)` followed by
`.buffered(PAGED_LIST_FETCH_CONCURRENCY)` before `try_fold(...)`, while keeping
the existing `find_remediations_by_ids` error mapping to
`CarbideCliError::ApiInvocationError`. This preserves buffered concurrent
fetches for large result sets.

2580-2590: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make this helper test table-driven.

This pure input/output helper already has multiple cases; use a small case table so future cap edge cases can be added without duplicating assertion structure.

Suggested refactor

 #[test]
 fn cap_chunk_size_respects_server_limit() {
-    // A zero/unset cap means no server limit -- use page_size as-is.
-    assert_eq!(cap_chunk_size(100, 0), 100);
-    // A smaller cap wins, so a page never exceeds what the *ByIds RPCs accept.
-    assert_eq!(cap_chunk_size(100, 40), 40);
-    // A larger cap leaves page_size untouched.
-    assert_eq!(cap_chunk_size(100, 500), 100);
-    // Equal is a no-op.
-    assert_eq!(cap_chunk_size(100, 100), 100);
+    let cases = [
+        ("zero_or_unset_cap", 100, 0, 100),
+        ("smaller_cap_wins", 100, 40, 40),
+        ("larger_cap_leaves_page_size", 100, 500, 100),
+        ("equal_cap_is_noop", 100, 100, 100),
+    ];
+
+    for (name, page_size, cap, expected) in cases {
+        assert_eq!(cap_chunk_size(page_size, cap), expected, "{name}");
+    }
 }

As per coding guidelines, "Use table-driven test style when writing tests in Rust."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/rpc.rs` around lines 2580 - 2590, Refactor the
cap_chunk_size_respects_server_limit test into a table-driven style so each
input/output case is expressed as a row instead of repeated assertions. Keep the
same coverage for cap_chunk_size by moving the current zero, smaller, larger,
and equal cap scenarios into a small cases table and iterating over it with one
assertion block, making it easy to add more edge cases later.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/admin-cli/src/rpc.rs`:
- Around line 2428-2443: The remediation paging logic in the `remediations`
stream is fetching chunks serially because it uses `then(...)` inside the
`stream::iter` chain. Update the `remediations` pipeline to match the other
paged helpers by using `map(...)` followed by
`.buffered(PAGED_LIST_FETCH_CONCURRENCY)` before `try_fold(...)`, while keeping
the existing `find_remediations_by_ids` error mapping to
`CarbideCliError::ApiInvocationError`. This preserves buffered concurrent
fetches for large result sets.
- Around line 2580-2590: Refactor the cap_chunk_size_respects_server_limit test
into a table-driven style so each input/output case is expressed as a row
instead of repeated assertions. Keep the same coverage for cap_chunk_size by
moving the current zero, smaller, larger, and equal cap scenarios into a small
cases table and iterating over it with one assertion block, making it easy to
add more edge cases later.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3b51039b-64db-42b4-bba1-4e60882a42c4

📥 Commits

Reviewing files that changed from the base of the PR and between e7781e0 and a631c1c.

📒 Files selected for processing (1)

crates/admin-cli/src/rpc.rs

github-actions · 2026-06-25T03:40:38Z

🔍 Container Scan Summary

Service	Total	Critical	High	Medium	Low	Other
boot-artifacts-aarch64	3	0	0	3	0	0
boot-artifacts-x86_64	3	0	0	3	0	0
forge-admin-cli-x86_64	265	6	24	98	7	130
machine-validation-runner	717	32	188	267	36	194
machine_validation	717	32	188	267	36	194
machine_validation-aarch64	717	32	188	267	36	194
nvmetal-carbide	717	32	188	267	36	194
TOTAL	3139	134	776	1172	151	906

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

chet requested a review from a team as a code owner June 25, 2026 02:21

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread crates/admin-cli/src/rpc.rs

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

rwthompsonii approved these changes Jun 25, 2026

View reviewed changes

chet merged commit 77d0151 into NVIDIA:main Jun 25, 2026
58 checks passed

chet deleted the gh-issue-2872 branch June 25, 2026 05:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(admin-cli): cap paged chunk size to the server's max_find_by_ids#2874

fix(admin-cli): cap paged chunk size to the server's max_find_by_ids#2874
chet merged 1 commit into
NVIDIA:mainfrom
chet:gh-issue-2872

chet commented Jun 25, 2026

Uh oh!

chet commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chet commented Jun 25, 2026

Uh oh!

chet commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Possibly related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 25, 2026

🔍 Container Scan Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading

coderabbitai Bot commented Jun 25, 2026 •

edited

Loading