Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DatadogMonitor Finalizers get removed regardless of failure #1288

Open
paulbrassard-figure opened this issue Jul 11, 2024 · 0 comments
Open
Labels
bug Something isn't working

Comments

@paulbrassard-figure
Copy link

Describe what happened:
Deleting a DatadogMonitor that is referenced by a DatadogSLO results in the monitor resource being deleted from the kubernetes cluster but not from Datadog. This results in the monitor existing still, outside of source control.

Describe what you expected:
Finalizers are not removed until the datadog-operator is able to successfully finalize the DatadogMonitor, resulting in the kubernetes object DatadogMonitor remaining until the resource is deleted from Datadog.

Steps to reproduce the issue:

  • Create a DatadogMonitor
  • Create a DatadogSLO that references the DatadogMonitor
  • Delete the DatadogMonitor

Additional information:

  • Function finalizeDatadogMonitor only logs an error in case of a failure
    func (r *Reconciler) finalizeDatadogMonitor(logger logr.Logger, dm *datadoghqv1alpha1.DatadogMonitor) {
    if dm.Status.Primary {
    err := deleteMonitor(r.datadogAuth, r.datadogClient, dm.Status.ID)
    if err != nil {
    logger.Error(err, "failed to finalize monitor", "Monitor ID", fmt.Sprint(dm.Status.ID))
    return
    }
    logger.Info("Successfully finalized DatadogMonitor", "Monitor ID", fmt.Sprint(dm.Status.ID))
    event := buildEventInfo(dm.Name, dm.Namespace, datadog.DeletionEvent)
    r.recordEvent(dm, event)
    }
    }
  • Function handleFinalizer will remove the finalizers regardless of failure
    r.finalizeDatadogMonitor(logger, dm)
    dm.SetFinalizers(utils.RemoveString(dm.GetFinalizers(), datadogMonitorFinalizer))
    err := r.client.Update(context.TODO(), dm)
  • Example log with SLO related failure:
    datadog-operator-576d8ccd98-ftftr {"level":"ERROR","ts":"2024-07-11T19:54:35Z","logger":"controllers.DatadogMonitor","msg":"failed to finalize monitor","datadogmonitor":"default/sandbox-slo-generation-verification-datadog-monitor","Monitor ID":"149070491","error":"error deleting monitor: 400 Bad Request: {\"errors\":[\"defaultdict(<class 'list'>, {149070491: ['monitor [149070491,Sandbox SLO generation monitor] is referenced in slos: [7716c0b1451055bd87688d5733f4e6d2,sandbox-slo-generation-verification-datadog-monitor slo]']})\"]}"}
    

Environment details (Operating System, Cloud provider, etc):
We are on GKE, deployed the latest datadog operator with helm chart 1.8.3 as detailed in the documentation.

@levan-m levan-m added the bug Something isn't working label Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants