Skip to content

Commit 29ea290

Browse files
authored
epp servicemonitor (#1425)
* epp servicemonitor and clusterpodmonitor templates Signed-off-by: sallyom <[email protected]> * add monitoring chart doc Signed-off-by: sallyom <[email protected]> --------- Signed-off-by: sallyom <[email protected]>
1 parent 4361b59 commit 29ea290

File tree

6 files changed

+84
-7
lines changed

6 files changed

+84
-7
lines changed

config/charts/inferencepool/README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,30 @@ Then apply it with:
117117
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
118118
```
119119

120+
### Install with Monitoring
121+
122+
To enable metrics collection and monitoring for the EndpointPicker, you can configure Prometheus ServiceMonitor creation:
123+
124+
```yaml
125+
inferenceExtension:
126+
monitoring:
127+
interval: "10s"
128+
prometheus:
129+
enabled: true
130+
secret:
131+
name: inference-gateway-sa-metrics-reader-secret
132+
```
133+
134+
**Note:** Prometheus monitoring requires the Prometheus Operator and ServiceMonitor CRD to be installed in the cluster.
135+
136+
For GKE environments, monitoring is automatically configured when `provider.name` is set to `gke`.
137+
138+
Then apply it with:
139+
140+
```txt
141+
helm install vllm-llama3-8b-instruct ./config/charts/inferencepool -f values.yaml
142+
```
143+
120144
## Uninstall
121145

122146
Run the following command to uninstall the chart:
@@ -147,6 +171,9 @@ The following table list the configurable parameters of the chart.
147171
| `inferenceExtension.affinity` | Affinity for the endpoint picker. Defaults to `{}`. |
148172
| `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. |
149173
| `inferenceExtension.flags.has-enable-leader-election` | Enable leader election for high availability. When enabled, only one EPP pod (the leader) will be ready to serve traffic. |
174+
| `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. |
175+
| `inferenceExtension.monitoring.secret.name` | Name of the service account token secret for metrics authentication. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
176+
| `inferenceExtension.monitoring.prometheus.enabled` | Enable Prometheus ServiceMonitor creation for EPP metrics collection. Defaults to `false`. |
150177
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
151178
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: `gke`. Defaults to `none`. |
152179

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
{{- if or .Values.inferenceExtension.monitoring.prometheus.enabled .Values.inferenceExtension.monitoring.gke.enabled }}
2+
apiVersion: v1
3+
kind: Secret
4+
metadata:
5+
name: {{ .Values.inferenceExtension.monitoring.secret.name }}
6+
namespace: {{ .Release.Namespace }}
7+
labels:
8+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
9+
annotations:
10+
kubernetes.io/service-account.name: {{ include "gateway-api-inference-extension.name" . }}
11+
type: kubernetes.io/service-account-token
12+
{{- end }}
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
{{- if .Values.inferenceExtension.monitoring.prometheus.enabled }}
2+
apiVersion: monitoring.coreos.com/v1
3+
kind: ServiceMonitor
4+
metadata:
5+
name: {{ include "gateway-api-inference-extension.name" . }}-monitor
6+
namespace: {{ .Release.Namespace }}
7+
labels:
8+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
9+
spec:
10+
endpoints:
11+
- interval: {{ .Values.inferenceExtension.monitoring.interval }}
12+
port: "http-metrics"
13+
path: "/metrics"
14+
authorization:
15+
credentials:
16+
key: token
17+
name: {{ .Values.inferenceExtension.monitoring.secret.name }}
18+
jobLabel: {{ include "gateway-api-inference-extension.name" . }}
19+
namespaceSelector:
20+
matchNames:
21+
- {{ .Release.Namespace }}
22+
selector:
23+
matchLabels:
24+
{{- include "gateway-api-inference-extension.labels" . | nindent 6 }}
25+
{{- end }}

config/charts/inferencepool/templates/gke.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,15 +46,15 @@ spec:
4646
endpoints:
4747
- port: metrics
4848
scheme: http
49-
interval: 5s
49+
interval: {{ .Values.inferenceExtension.monitoring.interval }}
5050
path: /metrics
5151
authorization:
5252
type: Bearer
5353
credentials:
5454
secret:
55-
name: {{ .Values.gke.monitoringSecret.name }}
55+
name: {{ .Values.inferenceExtension.monitoring.secret.name }}
5656
key: token
57-
namespace: {{ .Values.gke.monitoringSecret.namespace }}
57+
namespace: {{ .Release.Namespace }}
5858
selector:
5959
matchLabels:
6060
{{- include "gateway-api-inference-extension.selectorLabels" . | nindent 8 }}

config/charts/inferencepool/templates/rbac.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,12 @@ rules:
1717
- subjectaccessreviews
1818
verbs:
1919
- create
20+
{{- if .Values.inferenceExtension.monitoring.prometheus.enabled }}
21+
- nonResourceURLs:
22+
- "/metrics"
23+
verbs:
24+
- get
25+
{{- end }}
2026
---
2127
kind: ClusterRoleBinding
2228
apiVersion: rbac.authorization.k8s.io/v1

config/charts/inferencepool/values.yaml

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,17 @@ inferenceExtension:
4040

4141
tolerations: []
4242

43+
# Monitoring configuration for EPP
44+
monitoring:
45+
interval: "10s"
46+
# Service account token secret for authentication
47+
secret:
48+
name: inference-gateway-sa-metrics-reader-secret
49+
50+
# Prometheus ServiceMonitor will be created when enabled for EPP metrics collection
51+
prometheus:
52+
enabled: false
53+
4354
inferencePool:
4455
targetPorts:
4556
- number: 8000
@@ -56,7 +67,3 @@ inferencePool:
5667
provider:
5768
name: none
5869

59-
gke:
60-
monitoringSecret:
61-
name: inference-gateway-sa-metrics-reader-secret
62-
namespace: default

0 commit comments

Comments
 (0)