Skip to content

agent: Skip HBN configuration when config versions are unchanged#2848

Merged
DrewBloechl merged 2 commits into
NVIDIA:mainfrom
DrewBloechl:drew/agent-loop
Jun 26, 2026
Merged

agent: Skip HBN configuration when config versions are unchanged#2848
DrewBloechl merged 2 commits into
NVIDIA:mainfrom
DrewBloechl:drew/agent-loop

Conversation

@DrewBloechl

Copy link
Copy Markdown
Contributor

We've seen evidence that the nv config diff method that one of the HBN update functions relies on may sometimes result in no-op config changes. NVUE will complain about this with a "config apply executed with no config diff" message, but more annoyingly this also causes the code that checks for actual changes to get a false positive, which turns into a needless PostConfigCheckWait health alert.

In order to avoid this, we now track the managed_host_config_version and instance_network_config_version fields from the ManagedHostNetworkConfigResponse message, and avoid calling any reconfiguration code if these versions haven't changed since the last time this agent process saw them.

Related issues

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@DrewBloechl DrewBloechl requested a review from a team as a code owner June 24, 2026 18:36
@coderabbitai

coderabbitai Bot commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Summary by CodeRabbit

  • New Features
    • Improved update handling so unchanged network settings are skipped, helping reduce unnecessary work and speed up repeated runs.
  • Bug Fixes
    • Added clearer logging when applying network changes, making it easier to review what changed during an update.
    • Improved handling of empty string values to avoid treating blank inputs as meaningful data.

Walkthrough

The agent now tracks applied network configuration versions to skip redundant updates, adds a helper for treating empty strings as absent, and preserves nv config diff output when nv config apply emits stdout.

Changes

Network config version gating

Layer / File(s) Summary
Helper and version state
crates/agent/src/lib.rs, crates/agent/src/main_loop.rs
get_non_empty_str returns None for empty strings, and MainLoop adds CurrentNetworkVersion state with version comparison and update methods.
Skip repeated updates
crates/agent/src/main_loop.rs
run_single_iteration skips HBN/network update work when stored and incoming versions match, then records the applied versions after a successful update.

NVUE apply logging

Layer / File(s) Summary
Preserve diff output
crates/agent/src/nvue.rs
run_apply stores nv config diff stdout and logs it when nv config apply -y produces stdout, with an added workaround comment.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the primary change: skipping HBN configuration when stored versions have not changed.
Description check ✅ Passed The description accurately explains the HBN version-tracking fix and its relation to the false-positive health alert.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
crates/agent/src/nvue.rs (1)

1045-1047: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use structured tracing fields for diff/apply diagnostics.

Line 1047 logs interpolated text; please emit apply_stdout and config_diff_stdout as tracing fields so logs remain queryable in logfmt.

Proposed change
-        tracing::info!("nv config apply: {stdout}");
+        tracing::info!(apply_stdout = %stdout, "nv config apply");
         // We're logging this just to see what was in there, in case it can help
         // explain the "config apply executed with no config diff" message.
-        tracing::info!("nv config diff: {config_diff_stdout}");
+        tracing::info!(config_diff_stdout = %config_diff_stdout, "nv config diff");

As per coding guidelines, "When writing log messages, prefer placing common fields as attributes passed to tracing functions instead of using string interpolation." As per path instructions, "Review Rust code against STYLE_GUIDE.md: ... structured tracing fields ..."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/agent/src/nvue.rs` around lines 1045 - 1047, The nv config diagnostics
are using interpolated text instead of structured tracing fields, which makes
the logs harder to query. Update the tracing in nvue.rs around the config
diff/apply logging to pass apply_stdout and config_diff_stdout as fields on the
tracing call rather than embedding them into the message string, keeping the
existing diagnostic context while making the output structured and
logfmt-friendly.

Sources: Coding guidelines, Path instructions

crates/agent/src/main_loop.rs (1)

661-664: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Make the skip log structured.

The {:?} payload buries version values inside the message, making logfmt search harder. Prefer explicit tracing fields for the two versions. As per coding guidelines, “All services should emit logs in 'logfmt' syntax” and “prefer placing common fields as attributes passed to tracing functions”; as per path instructions, use “structured tracing fields”.

Proposed fix
                         tracing::debug!(
-                            "No configuration change, skipping HBN updates: {:?}",
-                            &self.current_network_version
+                            managed_host_config_version = self
+                                .current_network_version
+                                .managed_host_config_version
+                                .as_deref()
+                                .unwrap_or_default(),
+                            instance_network_config_version = self
+                                .current_network_version
+                                .instance_network_config_version
+                                .as_deref()
+                                .unwrap_or_default(),
+                            "No configuration change, skipping HBN updates"
                         );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@crates/agent/src/main_loop.rs` around lines 661 - 664, The skip message in
the HBN update path is unstructured because `tracing::debug!` formats
`self.current_network_version` with `{:?}` inside the message, which makes
version values harder to search in logfmt. Update the logging in `main_loop.rs`
around the “No configuration change, skipping HBN updates” branch to use
structured tracing fields on `tracing::debug!` instead of embedding the version
in the message, and include both version values as explicit attributes so the
log stays logfmt-friendly and searchable.

Sources: Coding guidelines, Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@crates/agent/src/main_loop.rs`:
- Around line 427-445: CurrentNetworkVersion::matches_versions_from currently
treats the default tracker state as already matching empty gRPC versions, which
can wrongly skip the first HBN apply; add explicit state in
CurrentNetworkVersion to record whether an apply has happened and only permit
the version match shortcut after a successful apply. Update the logic in
CurrentNetworkVersion and its callers in main_loop.rs so the initial Default
state (None, None) never suppresses the first update when
managed_host_config_version and instance_network_config_version are absent.

---

Nitpick comments:
In `@crates/agent/src/main_loop.rs`:
- Around line 661-664: The skip message in the HBN update path is unstructured
because `tracing::debug!` formats `self.current_network_version` with `{:?}`
inside the message, which makes version values harder to search in logfmt.
Update the logging in `main_loop.rs` around the “No configuration change,
skipping HBN updates” branch to use structured tracing fields on
`tracing::debug!` instead of embedding the version in the message, and include
both version values as explicit attributes so the log stays logfmt-friendly and
searchable.

In `@crates/agent/src/nvue.rs`:
- Around line 1045-1047: The nv config diagnostics are using interpolated text
instead of structured tracing fields, which makes the logs harder to query.
Update the tracing in nvue.rs around the config diff/apply logging to pass
apply_stdout and config_diff_stdout as fields on the tracing call rather than
embedding them into the message string, keeping the existing diagnostic context
while making the output structured and logfmt-friendly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ad1e5f64-6273-4708-91c5-cf62dc6c7cd4

📥 Commits

Reviewing files that changed from the base of the PR and between ea94336 and b0b83c3.

📒 Files selected for processing (3)
  • crates/agent/src/lib.rs
  • crates/agent/src/main_loop.rs
  • crates/agent/src/nvue.rs

Comment thread crates/agent/src/main_loop.rs
@github-actions

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 265 6 24 98 7 130
machine-validation-runner 717 32 188 267 36 194
machine_validation 717 32 188 267 36 194
machine_validation-aarch64 717 32 188 267 36 194
nvmetal-carbide 717 32 188 267 36 194
TOTAL 3139 134 776 1172 151 906

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@DrewBloechl DrewBloechl merged commit 9e904ee into NVIDIA:main Jun 26, 2026
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants