Skip to content

fix(admin-cli): cap paged chunk size to the server's max_find_by_ids#2874

Merged
chet merged 1 commit into
NVIDIA:mainfrom
chet:gh-issue-2872
Jun 25, 2026
Merged

fix(admin-cli): cap paged chunk size to the server's max_find_by_ids#2874
chet merged 1 commit into
NVIDIA:mainfrom
chet:gh-issue-2872

Conversation

@chet

@chet chet commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

The admin-cli's paged get_all_* wrappers chunk ids by the CLI's page_size, but the server's *ByIds RPCs reject any request whose id count exceeds runtime_config.max_find_by_ids. Since the two limits are configured independently, a deployment with page_size above the server cap would fail every paged list command with InvalidArgument. This routes all the wrappers through one cap, so a page never exceeds what the server accepts.

  • A new ApiClient::effective_chunk_size reads max_find_by_ids from the RuntimeConfig the version RPC already exposes and returns min(page_size, cap); a zero/unset cap falls back to page_size, since chunking by zero would panic.
  • All 21 paged .chunks(page_size) sites (machines, instances, racks, switches, power-shelves, segments, VPCs, DPAs, partitions, keysets, NSGs, explored hosts/devices, ...) now chunk by the capped size.
  • The cap arithmetic is a pure cap_chunk_size helper with a unit test.

Surfaced by CodeRabbit on PR #2833.

Tests added!

This supports #2872

The admin-cli's paged `get_all_*` wrappers chunk ids by the CLI's `page_size`,
but the server's `*ByIds` RPCs reject any request whose id count exceeds
`runtime_config.max_find_by_ids`. Since the two limits are configured
independently, a deployment with `page_size` above the server cap would fail
every paged list command with `InvalidArgument`. This routes all the wrappers
through one cap, so a page never exceeds what the server accepts.

- A new `ApiClient::effective_chunk_size` reads `max_find_by_ids` from the
  `RuntimeConfig` the `version` RPC already exposes and returns
  `min(page_size, cap)`; a zero/unset cap falls back to `page_size`, since
  chunking by zero would panic.
- All 21 paged `.chunks(page_size)` sites (machines, instances, racks,
  switches, power-shelves, segments, VPCs, DPAs, partitions, keysets, NSGs,
  explored hosts/devices, ...) now chunk by the capped size.
- The cap arithmetic is a pure `cap_chunk_size` helper with a unit test.

Surfaced by CodeRabbit on PR NVIDIA#2833.

Tests added!

This supports NVIDIA#2872

Signed-off-by: Chet Nichols III <chetn@nvidia.com>
@chet chet requested a review from a team as a code owner June 25, 2026 02:21
@chet

chet commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@coderabbitai full_review please, thanks!

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Summary by CodeRabbit

  • Bug Fixes
    • Improved list loading across multiple admin views so pagination respects server-side request limits.
    • Prevented failures when the server reports no applicable limit, keeping large result sets working reliably.
    • Made remediation listings more stable and consistent when retrieving results across multiple pages.

Walkthrough

ApiClient now reads the server max_find_by_ids runtime setting before chunking ID-based list requests. Several list retrieval paths use the capped chunk size, and remediation pagination now folds chunked fetches into one result.

Changes

Server-cap-aware pagination

Layer / File(s) Summary
Cap helper and runtime lookup
crates/admin-cli/src/rpc.rs
Adds cap_chunk_size, derives an effective chunk size from version(true), and covers capped and uncapped cases in unit tests.
General list pagination
crates/admin-cli/src/rpc.rs
Updates the main get_all_* ID-chunking paths to use the server-cap-aware chunk size before fetching each page.
Specialized pagination
crates/admin-cli/src/rpc.rs
Applies the same capped chunk sizing to site exploration, explored host/device, extension service, and DPF state retrieval.
Remediation pagination
crates/admin-cli/src/rpc.rs
Refactors get_all_remediations to chunk with the effective size, fetch each chunk, and accumulate the results with mapped API errors.

Sequence Diagram(s)

sequenceDiagram
  participant ApiClient
  participant API_server
  participant get_all_remediations
  ApiClient->>API_server: version(true)
  API_server-->>ApiClient: runtime_config.max_find_by_ids
  ApiClient->>get_all_remediations: chunk IDs with effective_chunk_size(page_size)
  get_all_remediations-->>ApiClient: accumulated results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 74.07% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and accurately summarizes the main change: capping paged chunk sizes to the server limit.
Description check ✅ Passed The description is directly related to the change and explains the chunk-size capping fix and its motivation.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

@chet Understood — I’ll redo the full review of #2874 with attention to correctness, edge cases, and maintainability across the chunk-size capping changes and accompanying tests.

🐇

✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/admin-cli/src/rpc.rs`:
- Around line 107-115: The central chunk-size path in effective_chunk_size
currently allows a zero page_size to flow through cap_chunk_size, which can
later panic when used with .chunks(...). Update effective_chunk_size in rpc.rs
to reject page_size == 0 up front (or normalize it to a minimum of 1) before
applying the cap from version(true) and runtime_config.max_find_by_ids, so all
callers of this helper are guaranteed a safe nonzero chunk size.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f905e79a-7a08-45bb-a8ca-4a087753cb77

📥 Commits

Reviewing files that changed from the base of the PR and between e7781e0 and a631c1c.

📒 Files selected for processing (1)
  • crates/admin-cli/src/rpc.rs

Comment thread crates/admin-cli/src/rpc.rs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
crates/admin-cli/src/rpc.rs (2)

2428-2443: 🚀 Performance & Scalability | 🔵 Trivial | ⚡ Quick win

Keep remediation page fetches buffered.

This chain uses .then(...), so remediation chunks are fetched serially, unlike the surrounding paged helpers that use .buffered(PAGED_LIST_FETCH_CONCURRENCY). Switching to map(...).buffered(...).try_fold(...) preserves the established concurrency for large result sets.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/rpc.rs` around lines 2428 - 2443, The remediation paging
logic in the `remediations` stream is fetching chunks serially because it uses
`then(...)` inside the `stream::iter` chain. Update the `remediations` pipeline
to match the other paged helpers by using `map(...)` followed by
`.buffered(PAGED_LIST_FETCH_CONCURRENCY)` before `try_fold(...)`, while keeping
the existing `find_remediations_by_ids` error mapping to
`CarbideCliError::ApiInvocationError`. This preserves buffered concurrent
fetches for large result sets.

2580-2590: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make this helper test table-driven.

This pure input/output helper already has multiple cases; use a small case table so future cap edge cases can be added without duplicating assertion structure.

Suggested refactor
 #[test]
 fn cap_chunk_size_respects_server_limit() {
-    // A zero/unset cap means no server limit -- use page_size as-is.
-    assert_eq!(cap_chunk_size(100, 0), 100);
-    // A smaller cap wins, so a page never exceeds what the *ByIds RPCs accept.
-    assert_eq!(cap_chunk_size(100, 40), 40);
-    // A larger cap leaves page_size untouched.
-    assert_eq!(cap_chunk_size(100, 500), 100);
-    // Equal is a no-op.
-    assert_eq!(cap_chunk_size(100, 100), 100);
+    let cases = [
+        ("zero_or_unset_cap", 100, 0, 100),
+        ("smaller_cap_wins", 100, 40, 40),
+        ("larger_cap_leaves_page_size", 100, 500, 100),
+        ("equal_cap_is_noop", 100, 100, 100),
+    ];
+
+    for (name, page_size, cap, expected) in cases {
+        assert_eq!(cap_chunk_size(page_size, cap), expected, "{name}");
+    }
 }

As per coding guidelines, "Use table-driven test style when writing tests in Rust."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/admin-cli/src/rpc.rs` around lines 2580 - 2590, Refactor the
cap_chunk_size_respects_server_limit test into a table-driven style so each
input/output case is expressed as a row instead of repeated assertions. Keep the
same coverage for cap_chunk_size by moving the current zero, smaller, larger,
and equal cap scenarios into a small cases table and iterating over it with one
assertion block, making it easy to add more edge cases later.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@crates/admin-cli/src/rpc.rs`:
- Around line 2428-2443: The remediation paging logic in the `remediations`
stream is fetching chunks serially because it uses `then(...)` inside the
`stream::iter` chain. Update the `remediations` pipeline to match the other
paged helpers by using `map(...)` followed by
`.buffered(PAGED_LIST_FETCH_CONCURRENCY)` before `try_fold(...)`, while keeping
the existing `find_remediations_by_ids` error mapping to
`CarbideCliError::ApiInvocationError`. This preserves buffered concurrent
fetches for large result sets.
- Around line 2580-2590: Refactor the cap_chunk_size_respects_server_limit test
into a table-driven style so each input/output case is expressed as a row
instead of repeated assertions. Keep the same coverage for cap_chunk_size by
moving the current zero, smaller, larger, and equal cap scenarios into a small
cases table and iterating over it with one assertion block, making it easy to
add more edge cases later.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3b51039b-64db-42b4-bba1-4e60882a42c4

📥 Commits

Reviewing files that changed from the base of the PR and between e7781e0 and a631c1c.

📒 Files selected for processing (1)
  • crates/admin-cli/src/rpc.rs

@github-actions

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 265 6 24 98 7 130
machine-validation-runner 717 32 188 267 36 194
machine_validation 717 32 188 267 36 194
machine_validation-aarch64 717 32 188 267 36 194
nvmetal-carbide 717 32 188 267 36 194
TOTAL 3139 134 776 1172 151 906

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@chet chet merged commit 77d0151 into NVIDIA:main Jun 25, 2026
58 checks passed
@chet chet deleted the gh-issue-2872 branch June 25, 2026 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants