-
Notifications
You must be signed in to change notification settings - Fork 84
feat: add KEDA autoscaling example manifests #831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,26 @@ | ||||||||||||||||||||||||||||||
| apiVersion: keda.sh/v1alpha1 | ||||||||||||||||||||||||||||||
| kind: ScaledObject | ||||||||||||||||||||||||||||||
| metadata: | ||||||||||||||||||||||||||||||
| name: modelserving-scaler | ||||||||||||||||||||||||||||||
| spec: | ||||||||||||||||||||||||||||||
| scaleTargetRef: | ||||||||||||||||||||||||||||||
| apiVersion: workload.serving.volcano.sh/v1alpha1 | ||||||||||||||||||||||||||||||
| kind: ModelServing | ||||||||||||||||||||||||||||||
| name: my-modelserving | ||||||||||||||||||||||||||||||
| minReplicaCount: 1 | ||||||||||||||||||||||||||||||
| maxReplicaCount: 10 | ||||||||||||||||||||||||||||||
| cooldownPeriod: 120 | ||||||||||||||||||||||||||||||
| pollingInterval: 15 | ||||||||||||||||||||||||||||||
| triggers: | ||||||||||||||||||||||||||||||
| - type: prometheus | ||||||||||||||||||||||||||||||
| metadata: | ||||||||||||||||||||||||||||||
| serverAddress: http://prometheus.monitoring.svc.cluster.local:9090 | ||||||||||||||||||||||||||||||
| query: avg(vllm:num_requests_waiting) | ||||||||||||||||||||||||||||||
| threshold: "5" | ||||||||||||||||||||||||||||||
| metricName: vllm_requests_waiting_avg | ||||||||||||||||||||||||||||||
| - type: prometheus | ||||||||||||||||||||||||||||||
| metadata: | ||||||||||||||||||||||||||||||
| serverAddress: http://prometheus.monitoring.svc.cluster.local:9090 | ||||||||||||||||||||||||||||||
| query: sum(kthena_router_active_downstream_requests) | ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| query: avg(vllm:num_requests_waiting) | |
| threshold: "5" | |
| metricName: vllm_requests_waiting_avg | |
| - type: prometheus | |
| metadata: | |
| serverAddress: http://prometheus.monitoring.svc.cluster.local:9090 | |
| query: sum(kthena_router_active_downstream_requests) | |
| query: avg(vllm:num_requests_waiting{namespace="default",service="my-modelserving"}) | |
| threshold: "5" | |
| metricName: vllm_requests_waiting_avg | |
| - type: prometheus | |
| metadata: | |
| serverAddress: http://prometheus.monitoring.svc.cluster.local:9090 | |
| query: sum(kthena_router_active_downstream_requests{namespace="default",service="my-modelserving"}) |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious this metrics is not per model, how could it be appropriate. kthena router support routing to multiple models.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,15 @@ | ||||||||
| apiVersion: monitoring.coreos.com/v1 | ||||||||
| kind: PodMonitor | ||||||||
| metadata: | ||||||||
| name: inference-pods | ||||||||
| labels: | ||||||||
| app.kubernetes.io/component: inference | ||||||||
| spec: | ||||||||
| selector: | ||||||||
| matchLabels: | ||||||||
| modelserving.volcano.sh/entry: "true" | ||||||||
| podMetricsEndpoints: | ||||||||
| - port: http | ||||||||
| targetPort: 8000 | ||||||||
|
Comment on lines
+12
to
+13
|
||||||||
| - port: http | |
| targetPort: 8000 | |
| - targetPort: 8000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The targetPort field is not a valid field within a podMetricsEndpoints item for a PodMonitor resource. Its inclusion will likely make this manifest invalid and prevent metrics from being scraped. The port field is sufficient. Please remove the targetPort line.
- port: http
path: /metrics
interval: 15s| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,14 @@ | ||||||||
| apiVersion: monitoring.coreos.com/v1 | ||||||||
| kind: ServiceMonitor | ||||||||
| metadata: | ||||||||
|
||||||||
| metadata: | |
| metadata: | |
| namespace: kthena-system |
Copilot
AI
Mar 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ServiceMonitor.spec.selector.matchLabels matches any Service with app.kubernetes.io/component: kthena-router. The Helm chart also labels the kthena-router-webhook Service with the same component label, but that Service exposes only the webhook port (no http metrics port). This can lead to failed scrapes / confusing Prometheus targets. Prefer selecting a label that uniquely identifies the metrics Service, or add a dedicated label on the router metrics Service and match on that.
| app.kubernetes.io/component: kthena-router | |
| app.kubernetes.io/name: kthena-router |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
vllm:num_requests_waitingmetric is scraped from all inference pods matched by thePodMonitor. In an environment with multipleModelServinginstances, this query will calculate the average across all of them, leading to incorrect scaling decisions. The query should be filtered by a label that uniquely identifies the pods belonging to thisModelServinginstance (my-modelserving).For example, if pods have a label like
modelserving.volcano.sh/name: my-modelserving, the query should be updated to use it. Note that Prometheus relabeling will convert label characters like/and.to_. You'll need to verify the exact label on the pods.