Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions examples/keda-autoscaling/keda-scaledobject.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: modelserving-scaler
spec:
scaleTargetRef:
apiVersion: workload.serving.volcano.sh/v1alpha1
kind: ModelServing
name: my-modelserving
minReplicaCount: 1
maxReplicaCount: 10
cooldownPeriod: 120
pollingInterval: 15
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
query: avg(vllm:num_requests_waiting)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The vllm:num_requests_waiting metric is scraped from all inference pods matched by the PodMonitor. In an environment with multiple ModelServing instances, this query will calculate the average across all of them, leading to incorrect scaling decisions. The query should be filtered by a label that uniquely identifies the pods belonging to this ModelServing instance (my-modelserving).

For example, if pods have a label like modelserving.volcano.sh/name: my-modelserving, the query should be updated to use it. Note that Prometheus relabeling will convert label characters like / and . to _. You'll need to verify the exact label on the pods.

        query: avg(vllm:num_requests_waiting{modelserving_volcano_sh_name="my-modelserving"})

threshold: "5"
metricName: vllm_requests_waiting_avg
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
query: sum(kthena_router_active_downstream_requests)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The kthena_router_active_downstream_requests metric is labeled by model. This query sums up active requests across all models, which could lead to incorrect scaling behavior if multiple models are served by the same router. To ensure this ScaledObject only considers metrics for my-modelserving, you should filter the query by the model name.

        query: sum(kthena_router_active_downstream_requests{model="my-modelserving"})

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These Prometheus queries are not scoped to the specific ModelServing/router instance (no namespace/service/pod/model label filters). In a cluster with multiple ModelServings or routers, this will aggregate unrelated traffic/queue metrics and drive incorrect scaling. Update the queries to filter to the intended target (e.g., by namespace, service, and/or ModelServing-related pod labels) so the ScaledObject only reacts to metrics from my-modelserving.

Suggested change
query: avg(vllm:num_requests_waiting)
threshold: "5"
metricName: vllm_requests_waiting_avg
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
query: sum(kthena_router_active_downstream_requests)
query: avg(vllm:num_requests_waiting{namespace="default",service="my-modelserving"})
threshold: "5"
metricName: vllm_requests_waiting_avg
- type: prometheus
metadata:
serverAddress: http://prometheus.monitoring.svc.cluster.local:9090
query: sum(kthena_router_active_downstream_requests{namespace="default",service="my-modelserving"})

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious this metrics is not per model, how could it be appropriate. kthena router support routing to multiple models.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threshold: "20"
metricName: router_active_downstream_requests
15 changes: 15 additions & 0 deletions examples/keda-autoscaling/podmonitor-inference.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: inference-pods
labels:
app.kubernetes.io/component: inference
spec:
selector:
matchLabels:
modelserving.volcano.sh/entry: "true"
podMetricsEndpoints:
- port: http
targetPort: 8000
Comment on lines +12 to +13
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PodMonitor endpoint sets port: http, but ModelServing-generated inference pods/templates don’t name the container port http (they typically only set containerPort: 8000 without a name). With a named port mismatch Prometheus Operator won’t be able to resolve the scrape target. Either name the metrics port http in the pod spec, or remove port: and rely on targetPort, or set port to the actual named port used by the inference pods.

Suggested change
- port: http
targetPort: 8000
- targetPort: 8000

Copilot uses AI. Check for mistakes.
path: /metrics
interval: 15s
Comment on lines +12 to +15
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The targetPort field is not a valid field within a podMetricsEndpoints item for a PodMonitor resource. Its inclusion will likely make this manifest invalid and prevent metrics from being scraped. The port field is sufficient. Please remove the targetPort line.

    - port: http
      path: /metrics
      interval: 15s

14 changes: 14 additions & 0 deletions examples/keda-autoscaling/servicemonitor-router.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All three example manifests omit metadata.namespace. In this repo, most example YAMLs set an explicit namespace (e.g. kthena-system/default), and for ServiceMonitor/PodMonitor it also affects discovery because they only select targets within their own namespace unless spec.namespaceSelector is configured. Consider adding an explicit namespace (and namespaceSelector if you expect scraping across namespaces) to make the example apply correctly out-of-the-box.

Suggested change
metadata:
metadata:
namespace: kthena-system

Copilot uses AI. Check for mistakes.
name: kthena-router
labels:
app.kubernetes.io/component: kthena-router
spec:
selector:
matchLabels:
app.kubernetes.io/component: kthena-router
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServiceMonitor.spec.selector.matchLabels matches any Service with app.kubernetes.io/component: kthena-router. The Helm chart also labels the kthena-router-webhook Service with the same component label, but that Service exposes only the webhook port (no http metrics port). This can lead to failed scrapes / confusing Prometheus targets. Prefer selecting a label that uniquely identifies the metrics Service, or add a dedicated label on the router metrics Service and match on that.

Suggested change
app.kubernetes.io/component: kthena-router
app.kubernetes.io/name: kthena-router

Copilot uses AI. Check for mistakes.
endpoints:
- port: http
path: /metrics
interval: 15s
Loading