Skip to content

Kubernetes operator for CronJob monitoring with SLA tracking, dead-man's switch detection, intelligent alerting (Slack, PagerDuty, webhook, email), and built-in web dashboard. Detects missed schedules, job failures, and performance regressions.

License

Notifications You must be signed in to change notification settings

iLLeniumStudios/cronjob-guardian

Repository files navigation

CronJob Guardian

GitHub Release CI Go Version Go Report Card License

A Kubernetes operator for monitoring CronJobs with SLA tracking, intelligent alerting, and a built-in dashboard.

Documentation | Examples

Why CronJob Guardian?

CronJobs power critical operations—backups, ETL pipelines, reports—but Kubernetes provides no built-in monitoring for them. When jobs fail silently or stop running, you only find out when it's too late.

CronJob Guardian watches your CronJobs and alerts you when something goes wrong.

Screenshots

Dashboard CronJob Details SLA Compliance
Dashboard Details SLA

Features

  • Dead-Man's Switch — Alert when CronJobs don't run within expected windows
  • SLA Tracking — Monitor success rates, duration percentiles (P50/P95/P99), detect regressions
  • Intelligent Alerts — Rich context with pod logs, events, and suggested fixes
  • Multiple Channels — Slack, PagerDuty, webhooks, email
  • Built-in Dashboard — Feature-rich web UI with charts, heatmaps, and exports
  • Prometheus Metrics — Export metrics for existing monitoring infrastructure

Quick Start

Install

helm install cronjob-guardian oci://ghcr.io/illeniumstudios/charts/cronjob-guardian \
  --namespace cronjob-guardian \
  --create-namespace

Set Up Alerts (Optional)

Create a Slack webhook secret and AlertChannel:

kubectl create secret generic slack-webhook \
  --namespace cronjob-guardian \
  --from-literal=url=https://hooks.slack.com/services/YOUR/WEBHOOK/URL
# slack-channel.yaml
apiVersion: guardian.illenium.net/v1alpha1
kind: AlertChannel
metadata:
  name: slack-alerts
  namespace: cronjob-guardian
spec:
  type: slack
  slack:
    webhookSecretRef:
      name: slack-webhook
      namespace: cronjob-guardian
      key: url
kubectl apply -f slack-channel.yaml

Monitor All CronJobs in a Namespace

# monitor-namespace.yaml
apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: all-cronjobs
  namespace: default  # namespace to monitor
spec:
  selector: {}  # empty selector = all CronJobs in this namespace
  deadManSwitch:
    enabled: true
    autoFromSchedule:
      enabled: true
  alerting:
    channelRefs:
      - name: slack-alerts
kubectl apply -f monitor-namespace.yaml

Monitor All CronJobs Cluster-Wide

# monitor-cluster.yaml
apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
  name: all-cluster-cronjobs
  namespace: cronjob-guardian
spec:
  selector:
    allNamespaces: true
  deadManSwitch:
    enabled: true
    autoFromSchedule:
      enabled: true
  alerting:
    channelRefs:
      - name: slack-alerts
kubectl apply -f monitor-cluster.yaml

Access the Dashboard

kubectl port-forward -n cronjob-guardian svc/cronjob-guardian 8080:8080

Open http://localhost:8080

Configuration Options

The CronJobMonitor CRD supports many options for fine-tuning your monitoring:

Feature Description
SLA Thresholds Set minimum success rates and maximum durations
Duration Regression Alert when jobs slow down over time
Maintenance Windows Suppress alerts during planned downtime
Severity Routing Route critical vs warning alerts to different channels
Custom Fix Patterns Define application-specific troubleshooting suggestions

See the full documentation for complete configuration reference.

More Examples

The examples/ directory contains ready-to-use configurations:

  • monitors/ — CronJobMonitor patterns for various use cases
  • alertchannels/ — Slack, PagerDuty, webhook, email configs
  • cronjobs/ — Sample CronJobs with best practices

Development

# Build
make build

# Run locally
make install  # Install CRDs
make run      # Run operator

# Test
make test
make test-e2e

Uninstall

helm uninstall cronjob-guardian --namespace cronjob-guardian
kubectl delete namespace cronjob-guardian

Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

License

Apache License 2.0. See LICENSE for details.

About

Kubernetes operator for CronJob monitoring with SLA tracking, dead-man's switch detection, intelligent alerting (Slack, PagerDuty, webhook, email), and built-in web dashboard. Detects missed schedules, job failures, and performance regressions.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •