A Kubernetes operator for monitoring CronJobs with SLA tracking, intelligent alerting, and a built-in dashboard.
CronJobs power critical operations—backups, ETL pipelines, reports—but Kubernetes provides no built-in monitoring for them. When jobs fail silently or stop running, you only find out when it's too late.
CronJob Guardian watches your CronJobs and alerts you when something goes wrong.
| Dashboard | CronJob Details | SLA Compliance |
|---|---|---|
![]() |
![]() |
![]() |
- Dead-Man's Switch — Alert when CronJobs don't run within expected windows
- SLA Tracking — Monitor success rates, duration percentiles (P50/P95/P99), detect regressions
- Intelligent Alerts — Rich context with pod logs, events, and suggested fixes
- Multiple Channels — Slack, PagerDuty, webhooks, email
- Built-in Dashboard — Feature-rich web UI with charts, heatmaps, and exports
- Prometheus Metrics — Export metrics for existing monitoring infrastructure
helm install cronjob-guardian oci://ghcr.io/illeniumstudios/charts/cronjob-guardian \
--namespace cronjob-guardian \
--create-namespaceCreate a Slack webhook secret and AlertChannel:
kubectl create secret generic slack-webhook \
--namespace cronjob-guardian \
--from-literal=url=https://hooks.slack.com/services/YOUR/WEBHOOK/URL# slack-channel.yaml
apiVersion: guardian.illenium.net/v1alpha1
kind: AlertChannel
metadata:
name: slack-alerts
namespace: cronjob-guardian
spec:
type: slack
slack:
webhookSecretRef:
name: slack-webhook
namespace: cronjob-guardian
key: urlkubectl apply -f slack-channel.yaml# monitor-namespace.yaml
apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
name: all-cronjobs
namespace: default # namespace to monitor
spec:
selector: {} # empty selector = all CronJobs in this namespace
deadManSwitch:
enabled: true
autoFromSchedule:
enabled: true
alerting:
channelRefs:
- name: slack-alertskubectl apply -f monitor-namespace.yaml# monitor-cluster.yaml
apiVersion: guardian.illenium.net/v1alpha1
kind: CronJobMonitor
metadata:
name: all-cluster-cronjobs
namespace: cronjob-guardian
spec:
selector:
allNamespaces: true
deadManSwitch:
enabled: true
autoFromSchedule:
enabled: true
alerting:
channelRefs:
- name: slack-alertskubectl apply -f monitor-cluster.yamlkubectl port-forward -n cronjob-guardian svc/cronjob-guardian 8080:8080The CronJobMonitor CRD supports many options for fine-tuning your monitoring:
| Feature | Description |
|---|---|
| SLA Thresholds | Set minimum success rates and maximum durations |
| Duration Regression | Alert when jobs slow down over time |
| Maintenance Windows | Suppress alerts during planned downtime |
| Severity Routing | Route critical vs warning alerts to different channels |
| Custom Fix Patterns | Define application-specific troubleshooting suggestions |
See the full documentation for complete configuration reference.
The examples/ directory contains ready-to-use configurations:
- monitors/ — CronJobMonitor patterns for various use cases
- alertchannels/ — Slack, PagerDuty, webhook, email configs
- cronjobs/ — Sample CronJobs with best practices
# Build
make build
# Run locally
make install # Install CRDs
make run # Run operator
# Test
make test
make test-e2ehelm uninstall cronjob-guardian --namespace cronjob-guardian
kubectl delete namespace cronjob-guardianContributions are welcome! Please feel free to submit issues and pull requests.
Apache License 2.0. See LICENSE for details.


