Skip to content

bug: Stop Nico applying pre-ingestion remediation steps on nodes that are assigned to a tenant #2842

Description

@desrod-nvidia

Version

v0.10.3-0-g4d11815e6

Describe the bug.

Nico automatically applied corrective actions on a subset of nodes in one site after detecting a 5‑hour time skew between the host and the host BMC, and tried to realign the BMC timezone to UTC to match the host.

The actions Nico perfromed were:

  • Power off the host
  • Correct the time zone on the host BMC
  • Restart the host BMC

After these steps, the affected nodes remained powered off, with their host BMC timezone set to UTC

These nodes were assigned to a tenant at the time and required manual intervention to power the nodes back on.

Further analysis showed that only a subset of nodes was impacted, specifically those with BMC lockdown disabled. On other nodes where Nico also detected a time skew, it could not make any changes because BMC lockdown was enabled. Nico continues to attempt timezone changes on those locked‑down BMCs, causing repeated log spamming from failed timezone change attempts.

This behaviour only started yesterday, so it appears to be associated with the most recent release.

Given that the nodes with mismatched BMC timezones were tenant‑assigned, it would be preferable for Nico not to enforce timezone changes when a node is assigned to a tenant. If Nico does apply changes, it should also restore the node’s power state to whatever it was before the changes. Finally, it would be desirable to stop the logs from being flooded with failed timezone change requests on BMCs in lockdown mode.

Minimum reproducible example

Relevant log output

Other/Misc.

No response

Code of Conduct

  • I agree to follow NVIDIA Infra Controller's Code of Conduct
  • I have searched the open bugs and have found no duplicates for this bug report

Metadata

Metadata

Labels

bugA defect in existing software (deprecated - use issue type, but it's needed for reporting now)interest/dsx

Type

No fields configured for Bug.

Projects

Status
Backlog

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions