Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring #100

Closed
39 of 50 tasks
Gerrit91 opened this issue Dec 15, 2022 · 1 comment
Closed
39 of 50 tasks

Monitoring #100

Gerrit91 opened this issue Dec 15, 2022 · 1 comment
Assignees
Labels

Comments

@Gerrit91
Copy link
Contributor

Gerrit91 commented Dec 15, 2022

Monitoring

Control Plane

Basic Stack Deployment

  • Create role in metal-roles (control-plane/roles/monitoring) and deploy to monitoring namespace
    • Prometheus, Grafana and Alertmanager
      • Alertmanager
        • Allow configuration of rules and backends
        • Bring some useful default rules
      • Grafana: Configurable Dashboards
        • metal-api Dashboard
        • Rethinkdb Dashboard
      • Deploy Service Monitors
        • metal-api
        • masterdata-api
        • masterdata-db (backup-restore-sidecar) / maybe add postgres-exporter as sidecar?
        • metal-db (+ backup-restore-sidecar)
        • ipam-db (backup-restore-sidecar) / maybe add postgres-exporter as sidecar?
        • gardener-metrics-exporter
    • Thanos
      • Configurable Alerts for the Ruler
  • Create role in metal-roles (control-plane/roles/logging) and deploy to monitoring namespace
    • Loki
    • Promtail

Control Plane Metrics Exporters

Discussions

  • Thanos vs. Mimir
  • Should we install a service mesh? (Linkerd)

Partition

  • Create role in metal-roles (partition/roles/monitoring) and deploy to management server as systemd service (wrapping a Docker container)
    • Prometheus
      • Scrape exporters
      • Scrape metal-image-cache-sync
    • Exporters
      • node_exporter
      • ipmi_exporter
      • blackbox_exporter
  • For the leaf switches
    • Deploy exporters in the leaf role
      • frr_exporter
      • cumulus_exporter
      • node_exporter
      • metal-core
    • Deploy promtail
  • For routers
    • Deploy frr_exporter and node_exporter
    • Deploy promtail

Gardener Seeds

  • Prometheus deployment
    • Service Monitors
      • lightbox_exporter
      • firewall node_exporter and nftables_exporter

Overview

monitoring

@Gerrit91
Copy link
Contributor Author

I guess the first version in place, so I close this issue. However, we want to move everything to a dedicated metal-monitoring repository and maybe also use some different deployment strategies than before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants