Skip to content
Merged
2 changes: 1 addition & 1 deletion .github/workflows/lint-yaml.yml
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ jobs:
git config --global user.email 'github-actions[bot]@users.noreply.github.com'
git add .
git commit -m "style: auto-fix YAML linting issues ✨"
git push origin ${{ github.head_ref }}
git push origin "HEAD:${{ github.head_ref }}"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
- name: Final Check
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,15 +228,15 @@ This tricks your local machine into thinking `localhost` is the remote server, w
1. **Update `/etc/hosts`**:
```bash
# Add this line
127.0.0.1 argocd.mip-tds.chuv.cscs.ch
127.0.0.1 argocd.example.com
```
2. **Open Tunnel (Sudo required for port 443)**:
```bash
sudo ssh -L 443:argocd.mip-tds.chuv.cscs.ch:443 <user>@<jump-host>
sudo ssh -L 443:argocd.example.com:443 <user>@<jump-host>
```
3. **Login**:
```bash
argocd login argocd.mip-tds.chuv.cscs.ch:443 --insecure --grpc-web
argocd login argocd.example.com:443 --insecure --grpc-web
```

### Initial secrets:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -101,3 +101,24 @@ rules:
# - apiGroups: ['']
# resources: [endpoints]
# verbs: [get, list, watch, create, update, delete, patch]
# Rule 5: Elastic Stack resources
- apiGroups:
- elasticsearch.k8s.elastic.co
- kibana.k8s.elastic.co
- beat.k8s.elastic.co
- apm.k8s.elastic.co
- enterprisesearch.k8s.elastic.co
- maps.k8s.elastic.co
- agent.k8s.elastic.co
- autoscaling.k8s.elastic.co
resources:
- elasticsearches
- kibanas
- beats
- apmservers
- enterprisesearches
- agents
- agentpolicies
- elasticmapsservers
- elasticsearchautoscalings
verbs: [create, delete, patch, update, get, list, watch]
4 changes: 3 additions & 1 deletion base/argo-projects.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ metadata:
argocd.instance: mip-team
annotations:
argocd.argoproj.io/note: 'Manages static AppProjects: mip-federations, mip-shared-apps,
mip-common, mip-security'
mip-common, mip-monitoring, mip-security, submariner'
spec:
generators:
- list:
Expand All @@ -21,6 +21,8 @@ spec:
fileName: mip-shared-apps
- projectName: mip-argo-project-common
fileName: mip-common
- projectName: mip-argo-project-monitoring
fileName: mip-monitoring
- projectName: mip-argo-project-security
fileName: mip-security
- projectName: mip-argo-project-submariner
Expand Down
58 changes: 58 additions & 0 deletions base/mip-infrastructure/rbac/eck-beats-rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
# Manual RBAC for ECK Beats (filebeat/metricbeat).
# ECK chart templates intentionally do not include Beat RBAC resources.
# ServiceAccounts are created by the ECK Helm chart in namespace elastic-system.
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: eck-filebeat
rules:
- apiGroups: ['']
resources: [pods, namespaces, nodes, endpoints, services]
verbs: [get, list, watch]
- apiGroups: [coordination.k8s.io]
resources: [leases]
verbs: [get, list, watch, create, update, delete]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: eck-metricbeat
rules:
- apiGroups: ['']
resources: [nodes, pods, namespaces, services, endpoints]
verbs: [get, list, watch]
- apiGroups: ['']
resources: [nodes/stats]
verbs: [get]
- nonResourceURLs: [/metrics]
verbs: [get]
- apiGroups: [coordination.k8s.io]
resources: [leases]
verbs: [get, list, watch, create, update, delete]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: eck-filebeat
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: eck-filebeat
subjects:
- kind: ServiceAccount
name: eck-filebeat
namespace: elastic-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: eck-metricbeat
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: eck-metricbeat
subjects:
- kind: ServiceAccount
name: eck-metricbeat
namespace: elastic-system
11 changes: 11 additions & 0 deletions common/elastic-operator/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
apiVersion: v2
name: elastic-operator
description: Elastic Cloud on Kubernetes (ECK) Operator
type: application
version: 1.0.0
appVersion: 2.13.0
dependencies:
- name: eck-operator
version: 2.13.0
repository: https://helm.elastic.co
5 changes: 5 additions & 0 deletions common/elastic-operator/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
eck-operator:
createNamespace: false
webhook:
enabled: true
12 changes: 12 additions & 0 deletions common/monitoring/eck/Chart.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
apiVersion: v2
name: eck-stack-rke2
description: Helm chart to install the Elastic Cloud on Kubernetes (ECK) operator
and sample Elastic Stack resources on an RKE2 cluster
kubeVersion: '>=1.23.0-0'
type: application
version: 0.2.0
appVersion: 2.13.0
icon: https://www.elastic.co/static/images/elastic-logo-200.png
keywords: [elasticsearch, eck, elastic-stack, operator]
dependencies: []
183 changes: 183 additions & 0 deletions common/monitoring/eck/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
# ECK Helm Chart for RKE2

This directory contains the ECK Helm chart. It targets managed RKE2 clusters where the Elastic Cloud on Kubernetes (ECK) operator already runs (Rancher installs it under `kube-system`). By default the chart provisions:

- A single-node Elasticsearch cluster plus Kibana.

Optional components (disabled by default):

- Filebeat and Metricbeat DaemonSets that forward cluster logs and metrics.
- The eck-notifier CronJob that pushes Kibana alert summaries to Microsoft Teams and Cisco Webex.

## Prerequisites

- Helm 3 and `kubectl` available locally.
- RKE2 cluster v1.23+ with access to the `elastic-system` namespace.
- ECK operator 2.13+ running cluster-wide.
> **Note regarding the ECK Operator**: This chart does **not** install the operator because doing so requires cluster-admin privileges that shouldn't be granted to this standard monitoring deployment. If your hosting provider (like Rancher) already provides it, you are good to go. Otherwise, you must install the `common/elastic-operator` chart and its priviledged namespace manually or include it in your infrastructure overlays before deploying this monitoring stack.
- Default StorageClass compatible with the sample workloads (defaults assume `ceph-corbo-cephfs`).
- Namespace prepared for Beats hostPath mounts (needed only if Beats are enabled and Pod Security Admission is enforced):

```bash
kubectl create namespace elastic-system
kubectl label namespace elastic-system \
pod-security.kubernetes.io/enforce=privileged \
pod-security.kubernetes.io/audit=privileged \
pod-security.kubernetes.io/warn=privileged --overwrite
```

- Secret `eck-eck-notifier-secrets` populated with Elasticsearch credentials plus Teams/Webex settings (required only if `alertNotifier.enabled=true`, see [Alert notifier configuration](#alert-notifier-configuration)).
- When Beats are enabled, apply the manual RBAC manifest (the chart does not template Beat RBAC resources):

```bash
kubectl apply -f base/mip-infrastructure/rbac/eck-beats-rbac.yaml
```

## Install / upgrade

```bash
helm upgrade --install eck . \
--namespace elastic-system \
--create-namespace \
--skip-crds \
--wait \
--timeout 15m
```

> Helm 4 uses server-side apply by default. Because the ECK operator also mutates the CRs, add `--server-side=false` (or configure the same in Argo CD) for conflict-free upgrades.

Supply overrides through `--set`/`-f my-values.yaml` as usual.

## Customising values

All knobs live in `values.yaml`. Common overrides:

- `elasticsearch.*` – adjust resources, replica count, or the StorageClass. Note: The default `storageClassName` is currently hardcoded to `ceph-corbo-cephfs` as it aligns with our current infrastructure, but you can override this for deployments in other environments.
- `kibana.ingress.*` – enable ingress, set hosts/TLS, or keep using port-forward.
- `observability.filebeat.*` / `observability.metricbeat.*` – enable and tune the DaemonSets. Filebeat defaults to 100m CPU, 400Mi request / 600Mi limit. Both use Generic Ephemeral Volumes for their `data` mounts by default (set to `ceph-corbo-cephfs` at 2Gi).
- `alertNotifier.*` – enable notifier mode, then change the Cron schedule, PVC behaviour, secret names/keys, or Teams/Webex delivery. Note: Like Elasticsearch, the notifier PVC's default `storageClassName` is hardcoded to `ceph-corbo-cephfs`.

## Alert notifier configuration

The chart bundles the `alertNotifier` CronJob so Kibana alerts arrive in Microsoft Teams or Cisco Webex. Adjust the schedule, outputs, and credentials through values. A minimal override file could look like:

```yaml
# alert-notifier-values.yaml
alertNotifier:
image:
repository: registry.example.com/eck-notifier
tag: latest
schedule: "*/5 * * * *"
es:
index: ".internal.alerts-observability.logs.alerts-default-*"
skipVerify: true
teams:
enabled: true
webex:
enabled: true
roomId: "" # leave empty to pull from the secret
personEmail: ""
tokenKey: webexBotToken
roomIdKey: webexRoomId
secret:
create: false
name: eck-eck-notifier-secrets

kibana:
ingress:
enabled: true
hosts:
- host: localhost
path: /
pathType: Prefix
http:
tls:
selfSignedCertificate:
disabled: true
config:
xpack.security.secureCookies: false
```

Deploy (or upgrade) the chart from the repository root:

```bash
helm upgrade --install eck . -f alert-notifier-values.yaml \
--namespace elastic-system --create-namespace
```

### Secret

Populate the notifier secret so the CronJob can talk to Elasticsearch and your chat tools:

```bash
kubectl create secret generic eck-eck-notifier-secrets \
-n elastic-system \
--from-literal=es-url=https://elasticsearch-sample-es-http.elastic-system.svc:9200 \
--from-literal=es-user=elastic \
--from-literal=es-pass="<elastic-password>" \
--from-literal=teams-webhook="https://outlook.office.com/webhook/..." \
--from-literal=webexBotToken="<webex-bot-token>" \
--from-literal=webexRoomId="Y2lzY29zcGFyazovL3VzL1JPT00v..."
```

If you prefer direct Webex messages, leave `webexRoomId` empty and set `alertNotifier.webex.personEmail` instead. Whenever Elasticsearch rotates the `elastic` password, regenerate the secret:

```bash
ES_PASS=$(kubectl get secret elasticsearch-sample-es-elastic-user \
-n elastic-system \
-o go-template='{{printf "%s" (index .data "elastic")}}' | base64 -d)

kubectl create secret generic eck-eck-notifier-secrets \
-n elastic-system \
--from-literal=es-url=https://elasticsearch-sample-es-http.elastic-system.svc:9200 \
--from-literal=es-user=elastic \
--from-literal=es-pass="$ES_PASS" \
--from-literal=teams-webhook="https://outlook.office.com/webhook/..." \
--from-literal=webexBotToken="<webex-bot-token>" \
--from-literal=webexRoomId="Y2lzY29zcGFyazovL3VzL1JPT00v..." \
--dry-run=client -o yaml | kubectl apply -f -
```

### Persistent state

The CronJob persists alert hashes under `/var/lib/eck-notifier/state.json` (PVC) so it only posts deltas. Override `alertNotifier.state.persistence.*` if you already have a claim or disable persistence for ephemeral deployments.

## Verifying the deployment

```bash
kubectl get elasticsearch -n elastic-system
kubectl get kibana -n elastic-system
# Optional (when enabled)
kubectl get beats.beat.k8s.elastic.co -n elastic-system
kubectl get cronjob eck-eck-notifier -n elastic-system
```

Fetch the autogenerated `elastic` password:

```bash
kubectl get secret elasticsearch-sample-es-elastic-user \
-n elastic-system \
-o go-template='{{printf "%s" (index .data "elastic")}}' | base64 -d; echo
```

## Accessing Kibana

Port-forward the service when you only need temporary access:

```bash
kubectl port-forward -n elastic-system svc/kibana-sample-kb-http 5601:5601
```

Then browse to `https://localhost:5601` (accept the self-signed cert warning) and log in with `elastic` plus the password above. To expose Kibana permanently, enable `kibana.ingress.enabled` and provide hosts/TLS values.

## Observability notes

Filebeat autodiscovers pods via hints and forwards container logs. Metricbeat scrapes nodes, pods, containers, volumes, the apiserver, and host metrics. They are disabled by default and can be enabled through `observability.*` in `values.yaml`.

## Uninstalling

```bash
helm uninstall eck -n elastic-system
```

This removes Elasticsearch/Kibana/Beats/notifier workloads but leaves the upstream ECK CRDs installed (so existing CRs keep working). Delete `crds/eck-crds.yaml` manually if you also want the CRDs gone after uninstalling.
Loading
Loading