Skip to content

CI: Prevent PR's that result in uncommitted changes#2884

Open
kensimon wants to merge 5 commits into
NVIDIA:mainfrom
kensimon:unclean-repo-check-and-sync
Open

CI: Prevent PR's that result in uncommitted changes#2884
kensimon wants to merge 5 commits into
NVIDIA:mainfrom
kensimon:unclean-repo-check-and-sync

Conversation

@kensimon

@kensimon kensimon commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Problem statement: Parts of this repo that are tracked in git, but generated from build steps (e.g. Cargo.lock, .pb.go files, some of the files under docs/) may be changed but not committed as part of a PR. This results in situations where you build the main branch and uncommitted changes show up even though you didn't touch anything (leaving the next person to have to include those unrelated changes in their PR.)

Fix this on a few fronts, by both checking for this scenario, and by eagerly running some code generation to surface these changes earlier:

  • Add a check-repo-clean.sh script that checks for uncommitted changes
  • Add a build step that runs code-gen in rest-api if the protobufs under crates/rpc/proto have changed (to make sure any deltas are included in the PR that caused them), then runs check-repo-clean to fail early if there are any uncommitted changes as a result
  • Add a build step that runs check-repo-clean as a final check to catch anything else
  • Make cargo make pre-commit-verify-workspace run the protobuf codegen too, so that local development stands a decent chance of noticing the changes before the PR runs
  • Add tasks to Makefile.toml to trigger the rest-api codegen manually

Also, run the checks now and check in the results so the repo is clean.

Related issues

#2883

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

The deltas in the synced protos in rest-api are quite large... it seems like people only partially commit the changes they need when doing a sync, to avoid a PR getting large. I didn't run a full test suite on rest-api but it seems like the changes should all work (including deleting dpa_rpc_nico.proto, which AFAICT is unused), and this will be the last time we have a bunch of de-synced protobuf code between the two parts of the codebase.

Problem statement: Parts of this repo that are tracked in git, but
generated from build steps (e.g. Cargo.lock, .pb.go files, some of the
files under docs/) may be changed but not committed as part of a PR.
This results in situations where you build the main branch and
uncommitted changes show up even though you didn't touch anything
(leaving the next person to have to include those unrelated changes in
their PR.)

Fix this on a few fronts, by both checking for this scenario, and by
eagerly running some code generation to surface these changes earlier:

- Add a check-repo-clean.sh script that checks for uncommitted changes
- Add a build step that runs code-gen in rest-api if the protobufs under
  crates/rpc/proto have changed (to make sure any deltas are included in
  the PR that caused them), then runs check-repo-clean to fail early if
  there are any deltas
- Add a build step that runs check-repo-clean as a final check to catch
  anything else
- Make `cargo make pre-commit-verify-workspace` run the codegen too, so
  that local development stands a decent chance of noticing the changes
  before the PR runs
- Add tasks to Makefile.toml to trigger the rest-api codegen manually

Also, run the checks now and check in the results so the repo is clean.
@kensimon kensimon requested review from a team as code owners June 25, 2026 15:12
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: efea70aa-dad1-4c5c-a048-ac7e4722eb0d

📥 Commits

Reviewing files that changed from the base of the PR and between b45861c and aff5c5e.

📒 Files selected for processing (1)
  • docs/observability/core_metrics.md
✅ Files skipped from review due to trivial changes (1)
  • docs/observability/core_metrics.md

Summary by CodeRabbit

  • New Features
    • Expanded REST/Core API contracts with new Forge RPCs, machine-validation queries, network/VPC enhancements, DHCP/NTP additions, and site-explorer Mellanox/BlueField device models.
    • Added core observability metrics for validation run age/staleness and per-phase iteration latency.
  • Bug Fixes
    • Strengthened release gating to verify REST protobufs are regenerated in sync with Core protocol changes.
    • Added additional post-test repository clean checks in CI.
  • Chores
    • Pinned protobuf tooling versions and standardized protobuf regeneration via Makefile targets.

Walkthrough

CI now detects core RPC proto changes, regenerates REST protos when needed, and blocks release jobs until sync checks pass. Proto contracts expand across BlueField discovery, networking, validation, secrets, and inventory. Observability docs add new metric entries.

Changes

REST proto sync and API surface

Layer / File(s) Summary
Core proto change detection
.github/workflows/ci.yaml, .github/workflows/rest-ci.yml
The workflows now surface core RPC proto change signals and let REST CI run when those files change.
Generation tasks and clean helper
scripts/check-repo-clean.sh, rest-api/Makefile, rest-api/flow/Makefile, rest-api/flow/internal/nicoapi/buf.gen.yaml, Makefile.toml, .github/workflows/rest-lint-and-test.yml, .gitignore
Shared tasks and Make targets centralize REST proto regeneration, plugin versions, and repository-clean validation.
Sync job and release gates
.github/workflows/ci.yaml
A new sync job regenerates REST protos from core protos, release jobs wait on it, and clean-tree checks are added.
BlueField discovery API
rest-api/flow/internal/nicoapi/nicoproto/site_explorer.proto, rest-api/flow/internal/nicoapi/nicoproto/nico.proto
Forge gains BlueField discovery RPCs and the related site-explorer device list/search proto messages.
VPC and networking models
rest-api/flow/internal/nicoapi/nicoproto/nico.proto
VPC, network, DHCP, interface, and machine state messages gain config and status fields.
Validation and inventory protos
rest-api/flow/internal/nicoapi/nicoproto/nico.proto
Machine validation, secret rewrap, rack, and browse proto messages and enums extend the Forge surface.

Observability metrics docs

Layer / File(s) Summary
Observability metrics docs
docs/observability/core_metrics.md
The metrics documentation adds machine-validation, site-explorer, and VPC prefix metric definitions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

Possibly related PRs

  • NVIDIA/infra-controller#2865: Adds a Forge RPC under crates/rpc/proto/**, which matches the new core-proto change detection and sync path in this PR.

Suggested labels

low risk

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: preventing pull requests from introducing uncommitted changes.
Description check ✅ Passed The description aligns with the implemented CI, cleanup, and protobuf sync changes in the pull request.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@kensimon kensimon changed the title CI: Prevent uncommitted changes from being checked in CI: Prevent PR's that result in uncommitted changes Jun 25, 2026
@github-actions

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-06-25 15:14:54 UTC | Commit: a29b7c7

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
rest-api/flow/internal/nicoapi/nicoproto/nico.proto (1)

1501-1567: 🗄️ Data Integrity & Integration | 🟠 Major

Dual-write VPC config/status and the legacy fields until the workflow layer is migrated. rest-api/workflow/pkg/activity/vpc/vpc.go still reads RoutingProfileType, NetworkSecurityGroupId, and Status.Vni from the top-level Vpc; if this message only populates config/status, those attributes will arrive empty and VPC sync will lose data.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rest-api/flow/internal/nicoapi/nicoproto/nico.proto` around lines 1501 -
1567, The Vpc proto now moves several values into config/status, but the
workflow layer still reads the legacy top-level Vpc fields, so those values will
be missing during sync. Update the Vpc write path to dual-write the affected
values into both the new nested messages and the existing deprecated fields,
especially routing_profile_type, network_security_group_id, and status.vni. Keep
the legacy fields populated in Vpc alongside VpcConfig and VpcStatus until
rest-api/workflow/pkg/activity/vpc/vpc.go is migrated.

Source: Path instructions

Makefile.toml (1)

579-617: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Use the sync-check task in pre-commit-verify-workspace.

Right now this aggregate only runs generate-rest-core-proto, so a direct cargo make pre-commit-verify-workspace can leave tracked proto outputs dirty and still exit 0. Swapping this dependency to check-rest-core-proto-sync keeps generation and cleanliness verification coupled.

Suggested fix
 dependencies = [
-  "generate-rest-core-proto",
+  "check-rest-core-proto-sync",
   "clippy-flow",
   "carbide-lints",
   "check-format-nightly",

As per path instructions, Makefile* changes should have clear failure behavior and remain consistent with documented verification commands.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile.toml` around lines 579 - 617, `pre-commit-verify-workspace`
currently depends on `generate-rest-core-proto`, which can succeed even when
generated REST proto outputs remain dirty. Update the aggregate task to depend
on `check-rest-core-proto-sync` instead, so `generate-rest-core-proto` and
`check-repo-clean` stay coupled and the verification fails when tracked proto
artifacts are not committed.

Source: Path instructions

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yaml:
- Around line 1057-1065: The check-rest-core-proto-sync job is inheriting
broader default GITHUB_TOKEN permissions and leaving checkout credentials
available to later steps. Harden the workflow by explicitly setting minimal
read-only permissions for this job and by adjusting the actions/checkout@v4 step
so it does not persist credentials for the subsequent Makefile/script execution.
Use the check-rest-core-proto-sync job and its checkout step as the place to
apply the fix.

In `@docs/observability/core_metrics.md`:
- Line 123: The observability docs entry uses a stale metric name suffix and
should match the emitted series from site-explorer. Update the table row in the
metrics documentation to use the exact name exported by
crates/site-explorer/src/metrics.rs, namely the
carbide_site_explorer_phase_latency symbol, so dashboards and alerts reference
the correct metric.

In `@rest-api/flow/Makefile`:
- Around line 20-23: The Makefile still allows any existing buf on PATH to be
used, so generation can run with the wrong version. Update the buf setup in the
Makefile to enforce the pinned v1.70.0 binary unconditionally by invoking the
installed binary directly or by checking buf --version and failing if it is not
the expected version. Keep the change focused around the existing buf
install/generate flow so the version used by buf generate is always the pinned
one.

In `@scripts/check-repo-clean.sh`:
- Around line 8-21: The clean-check in check-repo-clean.sh is only comparing the
worktree against the index, so staged regenerated files can still be reported as
clean. Update the repository-clean test in this script to also detect staged
changes by checking the index against HEAD (for example alongside the existing
git diff check), while keeping the untracked-files check intact. Make the
condition in the main guard around the current git diff / untracked logic fail
whenever either staged or unstaged changes exist.

---

Outside diff comments:
In `@Makefile.toml`:
- Around line 579-617: `pre-commit-verify-workspace` currently depends on
`generate-rest-core-proto`, which can succeed even when generated REST proto
outputs remain dirty. Update the aggregate task to depend on
`check-rest-core-proto-sync` instead, so `generate-rest-core-proto` and
`check-repo-clean` stay coupled and the verification fails when tracked proto
artifacts are not committed.

In `@rest-api/flow/internal/nicoapi/nicoproto/nico.proto`:
- Around line 1501-1567: The Vpc proto now moves several values into
config/status, but the workflow layer still reads the legacy top-level Vpc
fields, so those values will be missing during sync. Update the Vpc write path
to dual-write the affected values into both the new nested messages and the
existing deprecated fields, especially routing_profile_type,
network_security_group_id, and status.vni. Keep the legacy fields populated in
Vpc alongside VpcConfig and VpcStatus until
rest-api/workflow/pkg/activity/vpc/vpc.go is migrated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 0f9db42d-1738-48ad-983c-cb7d2c81e9dd

📥 Commits

Reviewing files that changed from the base of the PR and between 790a82d and a29b7c7.

⛔ Files ignored due to path filters (12)
  • rest-api/flow/internal/nicoapi/gen/fmds.pb.go is excluded by !**/*.pb.go, !**/gen/**, !rest-api/**/*.pb.go
  • rest-api/flow/internal/nicoapi/gen/fmds_grpc.pb.go is excluded by !**/*.pb.go, !**/gen/**, !rest-api/**/*.pb.go, !rest-api/**/*_grpc.pb.go
  • rest-api/flow/internal/nicoapi/gen/nico.pb.go is excluded by !**/*.pb.go, !**/gen/**, !rest-api/**/*.pb.go
  • rest-api/flow/internal/nicoapi/gen/nico_grpc.pb.go is excluded by !**/*.pb.go, !**/gen/**, !rest-api/**/*.pb.go, !rest-api/**/*_grpc.pb.go
  • rest-api/flow/internal/nicoapi/gen/site_explorer.pb.go is excluded by !**/*.pb.go, !**/gen/**, !rest-api/**/*.pb.go
  • rest-api/workflow-schema/schema/site-agent/workflows/v1/dpa_rpc_nico.pb.go is excluded by !**/*.pb.go, !rest-api/**/*.pb.go
  • rest-api/workflow-schema/schema/site-agent/workflows/v1/nico_nico.pb.go is excluded by !**/*.pb.go, !rest-api/**/*.pb.go
  • rest-api/workflow-schema/schema/site-agent/workflows/v1/nico_nico_grpc.pb.go is excluded by !**/*.pb.go, !rest-api/**/*.pb.go, !rest-api/**/*_grpc.pb.go
  • rest-api/workflow-schema/schema/site-agent/workflows/v1/site_explorer_nico.pb.go is excluded by !**/*.pb.go, !rest-api/**/*.pb.go
  • rest-api/workflow-schema/site-agent/workflows/v1/dpa_rpc_nico.proto is excluded by !rest-api/workflow-schema/site-agent/workflows/v1/*_nico.proto
  • rest-api/workflow-schema/site-agent/workflows/v1/nico_nico.proto is excluded by !rest-api/workflow-schema/site-agent/workflows/v1/*_nico.proto
  • rest-api/workflow-schema/site-agent/workflows/v1/site_explorer_nico.proto is excluded by !rest-api/workflow-schema/site-agent/workflows/v1/*_nico.proto
📒 Files selected for processing (12)
  • .github/workflows/ci.yaml
  • .github/workflows/rest-ci.yml
  • .github/workflows/rest-lint-and-test.yml
  • Makefile.toml
  • docs/observability/core_metrics.md
  • rest-api/Makefile
  • rest-api/flow/Makefile
  • rest-api/flow/internal/nicoapi/buf.gen.yaml
  • rest-api/flow/internal/nicoapi/nicoproto/fmds.proto
  • rest-api/flow/internal/nicoapi/nicoproto/nico.proto
  • rest-api/flow/internal/nicoapi/nicoproto/site_explorer.proto
  • scripts/check-repo-clean.sh

Comment thread docs/observability/core_metrics.md
Comment thread rest-api/flow/Makefile
Comment thread scripts/check-repo-clean.sh
@github-actions

github-actions Bot commented Jun 25, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 283 6 24 98 7 148
machine-validation-runner 744 32 188 267 36 221
machine_validation 744 32 188 267 36 221
machine_validation-aarch64 744 32 188 267 36 221
nvmetal-carbide 744 32 188 267 36 221
TOTAL 3265 134 776 1172 151 1032

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

kensimon added 2 commits June 25, 2026 13:02
Also add some stuff to .gitignore that the github runner outputs

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/ci.yaml:
- Line 1069: The workflow step using actions/checkout@v4 should be updated to an
immutable commit SHA to satisfy the repository’s pinned-action policy. Replace
the mutable tag in the checkout step with the exact SHA for the intended
version, and keep the change within the existing checkout usage in the CI
workflow so the action reference stays fixed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 7ac3c2b7-75ce-4476-9de2-8482d815569d

📥 Commits

Reviewing files that changed from the base of the PR and between a29b7c7 and ffc5446.

📒 Files selected for processing (2)
  • .github/workflows/ci.yaml
  • .gitignore
✅ Files skipped from review due to trivial changes (1)
  • .gitignore

Comment thread .github/workflows/ci.yaml
@kensimon

Copy link
Copy Markdown
Contributor Author

Use the sync-check task in pre-commit-verify-workspace.

I'm intentionally running the codegen as part of verify-workspace, instead of just checking it, because it's a likely step that rust developers will actually run as part of their workflow. The goal here is to make it so when you change forge.proto, you actually run the build step that syncs the changes to rest-api. I didn't want to put it in e.g. crates/rpc/build.rs, because writing files outside of the crate is surprising behavior... and there's no other build step that people typically run locally that's a better candidate.

Dual-write VPC config/status and the legacy fields until the workflow layer is migrated

This is not part of this PR, I don't know why this was mentioned. I'm syncing the protobuf over, any changes are already part of the main branch by definition, this just ensures the golang "view" of this protobuf is properly reflected. Any dual writing behavior (or lack thereof) is already merged into main.

@kensimon kensimon enabled auto-merge (squash) June 25, 2026 17:16

@thossain-nv thossain-nv left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the wiring @kensimon, this should be very helpful in ensuring the protobufs stay aligned.


// SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// SPDX-License-Identifier: Apache-2.0
/*

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a duplicate?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants