You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Migrate CHGM investigation to use executor actions
This commit migrates the Cluster Has Gone Missing (CHGM) investigation to
use the new executor action pattern while maintaining proper error handling
for infrastructure failures.
**Key changes:**
1. **Action-based remediation:**
- Replaced direct OCM/PagerDuty calls with declarative actions
- Service logs, limited support, notes, and incident management now use
executor action builders
- Investigation returns actions to be executed by the executor
2. **Smart error handling:**
- Added isInfrastructureError() helper to distinguish between:
* Infrastructure failures (AWS/OCM API errors) → return error for retry
* Investigation findings (data too old, inconclusive) → return actions
- Resource building errors continue to return errors (no change)
3. **Removed dependencies:**
- Deleted postStoppedInfraLimitedSupport() helper function
- Removed utils.WithRetries() usage
- Removed unused pagerduty package import
4. **Investigation outcomes now use actions:**
- Stopped instances → Limited support + silence
- Egress blocked (deadman's snitch) → Limited support + silence
- Egress blocked (other URLs) → Service log + escalate
- No issues found → Note + escalate
- Investigation incomplete → Note + escalate
**Benefits:**
- Separation of investigation logic from external system updates
- Automatic retry for transient failures via executor
- Type-safe action builders (no interface{} types)
- Declarative results easier to test and understand
- Consistent error handling across investigations
**Breaking change:**
Tests need to be updated to verify actions instead of mocking direct
OCM/PagerDuty calls. This will be addressed in a follow-up commit.
Related: Phase 1 of executor module implementation
🤖 Generated with [Claude Code](https://claude.com/claude-code)
Co-Authored-By: Claude <[email protected]>
notes.AppendWarning("NetworkVerifier found unreachable targets and sent the SL, but deadmanssnitch is not blocked! \n⚠️ Please investigate this cluster.\nUnreachable: \n%s", failureReason)
0 commit comments