Add Prometheus monitoring and KEDA autoscaling support#836
Add Prometheus monitoring and KEDA autoscaling support#836WHOIM1205 wants to merge 1 commit intovolcano-sh:mainfrom
Conversation
Add standalone manifests and Helm chart integration for: - ServiceMonitor for kthena-router - PodMonitor for inference pods (vLLM/SGLang) - KEDA ScaledObject for request-based autoscaling All resources are disabled by default and fully configurable via values.yaml. Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a prototype for integrating Prometheus-based monitoring and KEDA-driven autoscaling capabilities into Kthena. It establishes the necessary Kubernetes resources and Helm chart configurations to enable scraping of router and inference metrics, and demonstrates dynamic scaling of workloads in response to real-time request load. This foundational work aims to provide a robust mechanism for observability and elasticity within the ModelServing ecosystem. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces Prometheus monitoring (ServiceMonitor, PodMonitor) and KEDA autoscaling (ScaledObject) for the kthena-router and kthena-inference components, along with their respective Helm chart configurations and a standalone monitoring manifest. Feedback indicates that the scaleTargetName for KEDA autoscaling should be explicitly required or validated to prevent invalid configurations. Additionally, resource naming in Helm templates should leverage the kthena.fullname helper for consistency, and podMetricsEndpoints should be configurable via values.yaml for increased flexibility. The standalone monitoring.yaml manifest should include explicit namespace definitions for all resources and use the full FQDN for the Prometheus server address for robustness.
| @@ -0,0 +1,24 @@ | |||
| {{- if .Values.kthenaRouter.autoscaling.enabled }} | |||
There was a problem hiding this comment.
If autoscaling.enabled is true, scaleTargetName must be provided. An empty scaleTargetName will result in an invalid ScaledObject. It's good practice to enforce this requirement using Helm's required function to fail fast during template rendering.
{{- if .Values.kthenaRouter.autoscaling.enabled }}
{{- required "A value for kthenaRouter.autoscaling.scaleTargetName is required when autoscaling is enabled" .Values.kthenaRouter.autoscaling.scaleTargetName }}| # scaleTargetName is the name of the Deployment to scale | ||
| scaleTargetName: "" |
There was a problem hiding this comment.
The scaleTargetName is a critical value when autoscaling is enabled, but it defaults to an empty string. This will cause the ScaledObject to be invalid. It's important to make it clear to the user that this value must be set. Adding a note to the comment and using a required check in the template would improve usability.
# scaleTargetName is the name of the Deployment to scale (required if autoscaling.enabled is true)
scaleTargetName: ""| # -- Name of the Deployment to scale. | ||
| scaleTargetName: "" |
There was a problem hiding this comment.
The scaleTargetName is a critical value when autoscaling is enabled, but it defaults to an empty string. This will cause the ScaledObject to be invalid. It's important to make it clear to the user that this value must be set.
# -- Name of the Deployment to scale (required if autoscaling.enabled is true).
scaleTargetName: ""| metadata: | ||
| name: kthena-router | ||
| labels: | ||
| app.kubernetes.io/component: kthena-router |
There was a problem hiding this comment.
The Kubernetes resources in this manifest are not namespaced. It's a best practice to always define a namespace in the metadata for all resources to avoid deploying them into the default namespace accidentally and to ensure proper isolation. Please add a namespace to this ServiceMonitor and other resources in this file.
metadata:
name: kthena-router
namespace: default # Or another appropriate namespace
labels:
app.kubernetes.io/component: kthena-router| triggers: | ||
| - type: prometheus | ||
| metadata: | ||
| serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090 |
There was a problem hiding this comment.
The Prometheus server address is missing the .svc suffix. For cross-namespace service resolution, the FQDN of the service should be used, which is typically in the format <service-name>.<namespace>.svc.cluster.local. While this might work in some cluster configurations depending on DNS search paths, it's more robust to use the full or partial FQDN.
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090| podMetricsEndpoints: | ||
| - targetPort: 8000 | ||
| path: /metrics | ||
| interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }} | ||
| - targetPort: 30000 | ||
| path: /metrics | ||
| interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }} |
There was a problem hiding this comment.
The podMetricsEndpoints are hardcoded, which limits flexibility for different inference server backends or custom configurations. It would be better to make these endpoints configurable in values.yaml. This would allow users to easily add or change metric endpoints without modifying the chart template. This change would require adding an endpoints list to values.yaml under kthenaRouter.metrics.podMonitor.
podMetricsEndpoints:
{{- range .Values.kthenaRouter.metrics.podMonitor.endpoints }}
- targetPort: {{ .targetPort }}
path: {{ .path | default "/metrics" }}
interval: {{ $.Values.kthenaRouter.metrics.podMonitor.interval }}
{{- end }}| apiVersion: keda.sh/v1alpha1 | ||
| kind: ScaledObject | ||
| metadata: | ||
| name: kthena-inference-scaler |
There was a problem hiding this comment.
| apiVersion: monitoring.coreos.com/v1 | ||
| kind: ServiceMonitor | ||
| metadata: | ||
| name: kthena-router |
There was a problem hiding this comment.
| podMonitor: | ||
| # enabled creates a PodMonitor for inference pods (requires Prometheus Operator) | ||
| enabled: false | ||
| # interval is the scrape interval for the PodMonitor | ||
| interval: 15s |
There was a problem hiding this comment.
The PodMonitor template has hardcoded ports for metrics scraping. To make it more flexible for different backends, consider making the metric endpoints configurable here. This change would need to be paired with an update to the podmonitor.yaml template to iterate over this list.
podMonitor:
# enabled creates a PodMonitor for inference pods (requires Prometheus Operator)
enabled: false
# interval is the scrape interval for the PodMonitor
interval: 15s
# endpoints for the PodMonitor to scrape
endpoints:
- { targetPort: 8000, path: /metrics }
- { targetPort: 30000, path: /metrics }| podMonitor: | ||
| # -- Enable PodMonitor for inference pods (requires Prometheus Operator). | ||
| enabled: false | ||
| # -- Scrape interval for the PodMonitor. | ||
| interval: 15s |
There was a problem hiding this comment.
The PodMonitor template has hardcoded ports for metrics scraping. To make it more flexible for different backends, consider making the metric endpoints configurable here. This would involve adding an endpoints list that can be passed to the subchart and used in the podmonitor.yaml template.
podMonitor:
# -- Enable PodMonitor for inference pods (requires Prometheus Operator).
enabled: false
# -- Scrape interval for the PodMonitor.
interval: 15s
# -- Endpoints for the PodMonitor to scrape.
endpoints:
- { targetPort: 8000, path: /metrics }
- { targetPort: 30000, path: /metrics }There was a problem hiding this comment.
Pull request overview
Adds a Prometheus-Operator-based monitoring + KEDA autoscaling prototype for Kthena, exposing Helm-configurable ServiceMonitor/PodMonitor resources and a Prometheus-triggered ScaledObject to scale inference workloads based on router load.
Changes:
- Introduces Helm templates for
ServiceMonitor(router) andPodMonitor(inference pods) gated by new values flags. - Adds a Helm template for a KEDA
ScaledObjectusing a Prometheus trigger (query + threshold) gated by new values flags. - Adds a repo-level
monitoring.yamlwith example ServiceMonitor/PodMonitor/ScaledObject manifests.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| monitoring.yaml | Adds a non-Helm, repo-level monitoring + autoscaling manifest example. |
| charts/kthena/values.yaml | Adds top-level (parent chart) values for metrics + autoscaling configuration. |
| charts/kthena/charts/networking/values.yaml | Adds networking subchart values for metrics + autoscaling configuration. |
| charts/kthena/charts/networking/templates/kthena-router/component/servicemonitor.yaml | Helm template for router ServiceMonitor. |
| charts/kthena/charts/networking/templates/kthena-router/component/podmonitor.yaml | Helm template for inference PodMonitor. |
| charts/kthena/charts/networking/templates/kthena-router/component/scaledobject.yaml | Helm template for KEDA ScaledObject scaling a target Deployment via Prometheus. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| namespace: {{ .Release.Namespace }} | ||
| labels: | ||
| app.kubernetes.io/component: kthena-router | ||
| release: prometheus |
There was a problem hiding this comment.
The release: prometheus label is hard-coded. Prometheus Operator setups often use a different label key/value (or none) for selecting ServiceMonitors, so hard-coding this can make scraping silently not work. Make this label configurable (or omit it and let users add it via additionalLabels).
| release: prometheus | |
| {{- $prometheusLabels := default (dict "release" "prometheus") .Values.kthenaRouter.metrics.serviceMonitor.prometheusLabels }} | |
| {{- range $key, $value := $prometheusLabels }} | |
| {{ $key }}: {{ $value }} | |
| {{- end }} |
| release: prometheus | ||
| {{- include "kthena.labels" . | nindent 4 }} |
There was a problem hiding this comment.
The release: prometheus label is hard-coded. Prometheus Operator setups often use a different label key/value (or none) for selecting PodMonitors, so this can prevent scraping unless users happen to match this exact label. Make this label configurable (or omit it and expose additionalLabels).
| release: prometheus | |
| {{- include "kthena.labels" . | nindent 4 }} | |
| {{- include "kthena.labels" . | nindent 4 }} | |
| {{- with .Values.kthenaRouter.metrics.podMonitor.additionalLabels }} | |
| {{- toYaml . | nindent 4 }} | |
| {{- end }} |
| podMetricsEndpoints: | ||
| - targetPort: 8000 | ||
| path: /metrics | ||
| interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }} | ||
| - targetPort: 30000 | ||
| path: /metrics | ||
| interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }} |
There was a problem hiding this comment.
This PodMonitor config will scrape both targetPort: 8000 and targetPort: 30000 on every matched pod. For backends that only expose one of these ports, Prometheus will continuously record scrape failures/target down, which can be noisy and impact alerting. Consider splitting into two PodMonitors selected by a backend label, or make the endpoints list configurable so users only enable the port(s) they actually expose.
| podMetricsEndpoints: | |
| - targetPort: 8000 | |
| path: /metrics | |
| interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }} | |
| - targetPort: 30000 | |
| path: /metrics | |
| interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }} | |
| podMetricsEndpoints: | |
| {{- with .Values.kthenaRouter.metrics.podMonitor.podMetricsEndpoints }} | |
| {{- toYaml . | nindent 4 }} | |
| {{- else }} | |
| - targetPort: 8000 | |
| path: /metrics | |
| interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }} | |
| - targetPort: 30000 | |
| path: /metrics | |
| interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }} | |
| {{- end }} |
| prometheusAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090 | ||
| # -- PromQL query for scaling decisions. | ||
| query: sum(kthena_router_active_downstream_requests) | ||
| # -- Per-pod metric threshold that triggers scaling. |
There was a problem hiding this comment.
The comment says the threshold is a per-pod metric threshold, but the default query is sum(kthena_router_active_downstream_requests), which produces a cluster-wide total across all series. Either adjust the wording to reflect that the threshold applies to the query result, or change the default query to return a per-pod/per-target value if that’s the intended semantics.
| # -- Per-pod metric threshold that triggers scaling. | |
| # -- Metric threshold on the result of the PromQL query that triggers scaling. |
| apiVersion: monitoring.coreos.com/v1 | ||
| kind: ServiceMonitor | ||
| metadata: | ||
| name: kthena-router | ||
| labels: | ||
| app.kubernetes.io/component: kthena-router | ||
| spec: | ||
| selector: | ||
| matchLabels: | ||
| app.kubernetes.io/component: kthena-router | ||
| endpoints: | ||
| - port: http | ||
| path: /metrics | ||
| interval: 15s | ||
| --- | ||
| apiVersion: monitoring.coreos.com/v1 | ||
| kind: PodMonitor | ||
| metadata: | ||
| name: kthena-inference | ||
| labels: | ||
| app.kubernetes.io/part-of: kthena | ||
| spec: | ||
| selector: | ||
| matchExpressions: | ||
| - key: modelserving.volcano.sh/name | ||
| operator: Exists | ||
| - key: modelserving.volcano.sh/group-name | ||
| operator: Exists | ||
| - key: modelserving.volcano.sh/role | ||
| operator: Exists | ||
| - key: modelserving.volcano.sh/entry | ||
| operator: In | ||
| values: | ||
| - "true" | ||
| podMetricsEndpoints: | ||
| - port: "" | ||
| targetPort: 8000 | ||
| path: /metrics | ||
| interval: 15s | ||
| - port: "" | ||
| targetPort: 30000 | ||
| path: /metrics | ||
| interval: 15s | ||
| --- | ||
| apiVersion: keda.sh/v1alpha1 | ||
| kind: ScaledObject | ||
| metadata: | ||
| name: kthena-inference-scaler | ||
| labels: | ||
| app.kubernetes.io/part-of: kthena | ||
| spec: | ||
| minReplicaCount: 1 | ||
| maxReplicaCount: 10 | ||
| scaleTargetRef: | ||
| name: kthena-inference | ||
| triggers: | ||
| - type: prometheus | ||
| metadata: | ||
| serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090 | ||
| query: sum(kthena_router_active_downstream_requests) | ||
| threshold: "5" |
There was a problem hiding this comment.
This top-level monitoring.yaml duplicates the Helm-rendered ServiceMonitor/PodMonitor/ScaledObject resources added under charts/. That duplication is likely to drift over time (e.g., different Prometheus address, different PodMonitor endpoints), so either move this to an examples/ or docs/ location and clearly mark it as an example, or remove it in favor of the chart templates.
| apiVersion: monitoring.coreos.com/v1 | |
| kind: ServiceMonitor | |
| metadata: | |
| name: kthena-router | |
| labels: | |
| app.kubernetes.io/component: kthena-router | |
| spec: | |
| selector: | |
| matchLabels: | |
| app.kubernetes.io/component: kthena-router | |
| endpoints: | |
| - port: http | |
| path: /metrics | |
| interval: 15s | |
| --- | |
| apiVersion: monitoring.coreos.com/v1 | |
| kind: PodMonitor | |
| metadata: | |
| name: kthena-inference | |
| labels: | |
| app.kubernetes.io/part-of: kthena | |
| spec: | |
| selector: | |
| matchExpressions: | |
| - key: modelserving.volcano.sh/name | |
| operator: Exists | |
| - key: modelserving.volcano.sh/group-name | |
| operator: Exists | |
| - key: modelserving.volcano.sh/role | |
| operator: Exists | |
| - key: modelserving.volcano.sh/entry | |
| operator: In | |
| values: | |
| - "true" | |
| podMetricsEndpoints: | |
| - port: "" | |
| targetPort: 8000 | |
| path: /metrics | |
| interval: 15s | |
| - port: "" | |
| targetPort: 30000 | |
| path: /metrics | |
| interval: 15s | |
| --- | |
| apiVersion: keda.sh/v1alpha1 | |
| kind: ScaledObject | |
| metadata: | |
| name: kthena-inference-scaler | |
| labels: | |
| app.kubernetes.io/part-of: kthena | |
| spec: | |
| minReplicaCount: 1 | |
| maxReplicaCount: 10 | |
| scaleTargetRef: | |
| name: kthena-inference | |
| triggers: | |
| - type: prometheus | |
| metadata: | |
| serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090 | |
| query: sum(kthena_router_active_downstream_requests) | |
| threshold: "5" | |
| # Example monitoring configuration for kthena components. | |
| # | |
| # NOTE: | |
| # - The authoritative ServiceMonitor, PodMonitor, and ScaledObject resources | |
| # are rendered from the Helm charts under the `charts/` directory. | |
| # - This file is kept only as documentation / an example manifest and should | |
| # not be applied directly in environments that use the Helm chart. | |
| # | |
| # If you need to customize monitoring, prefer editing the Helm values/templates | |
| # rather than modifying this example file. | |
| # apiVersion: monitoring.coreos.com/v1 | |
| # kind: ServiceMonitor | |
| # metadata: | |
| # name: kthena-router | |
| # labels: | |
| # app.kubernetes.io/component: kthena-router | |
| # spec: | |
| # selector: | |
| # matchLabels: | |
| # app.kubernetes.io/component: kthena-router | |
| # endpoints: | |
| # - port: http | |
| # path: /metrics | |
| # interval: 15s | |
| # --- | |
| # apiVersion: monitoring.coreos.com/v1 | |
| # kind: PodMonitor | |
| # metadata: | |
| # name: kthena-inference | |
| # labels: | |
| # app.kubernetes.io/part-of: kthena | |
| # spec: | |
| # selector: | |
| # matchExpressions: | |
| # - key: modelserving.volcano.sh/name | |
| # operator: Exists | |
| # - key: modelserving.volcano.sh/group-name | |
| # operator: Exists | |
| # - key: modelserving.volcano.sh/role | |
| # operator: Exists | |
| # - key: modelserving.volcano.sh/entry | |
| # operator: In | |
| # values: | |
| # - "true" | |
| # podMetricsEndpoints: | |
| # - port: "" | |
| # targetPort: 8000 | |
| # path: /metrics | |
| # interval: 15s | |
| # - port: "" | |
| # targetPort: 30000 | |
| # path: /metrics | |
| # interval: 15s | |
| # --- | |
| # apiVersion: keda.sh/v1alpha1 | |
| # kind: ScaledObject | |
| # metadata: | |
| # name: kthena-inference-scaler | |
| # labels: | |
| # app.kubernetes.io/part-of: kthena | |
| # spec: | |
| # minReplicaCount: 1 | |
| # maxReplicaCount: 10 | |
| # scaleTargetRef: | |
| # name: kthena-inference | |
| # triggers: | |
| # - type: prometheus | |
| # metadata: | |
| # serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090 | |
| # query: sum(kthena_router_active_downstream_requests) | |
| # threshold: "5" |
| scaleTargetRef: | ||
| apiVersion: apps/v1 | ||
| kind: Deployment | ||
| name: {{ .Values.kthenaRouter.autoscaling.scaleTargetName }} |
There was a problem hiding this comment.
When autoscaling is enabled, scaleTargetName defaults to an empty string, which will render an invalid scaleTargetRef.name. Consider using Helm's required (or a conditional fail) to enforce a non-empty target name when .Values.kthenaRouter.autoscaling.enabled is true.
| name: {{ .Values.kthenaRouter.autoscaling.scaleTargetName }} | |
| name: {{ required "kthenaRouter.autoscaling.scaleTargetName must be set when autoscaling is enabled" .Values.kthenaRouter.autoscaling.scaleTargetName }} |
| serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress }} | ||
| query: {{ .Values.kthenaRouter.autoscaling.query }} |
There was a problem hiding this comment.
serverAddress and query are rendered unquoted. PromQL expressions commonly include characters ({}, [], ,, :) that can break YAML parsing or be re-interpreted; render these values with | quote to ensure the ScaledObject is always valid YAML regardless of the configured query/address.
| serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress }} | |
| query: {{ .Values.kthenaRouter.autoscaling.query }} | |
| serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress | quote }} | |
| query: {{ .Values.kthenaRouter.autoscaling.query | quote }} |
| prometheusAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090 | ||
| # query is the PromQL query used for scaling decisions | ||
| query: sum(kthena_router_active_downstream_requests) | ||
| # threshold is the per-pod metric value that triggers scaling |
There was a problem hiding this comment.
The comment describes threshold as a per-pod metric value, but the default query is sum(kthena_router_active_downstream_requests) (a total across all matched series). Update the comment to match the query semantics or adjust the default query so the threshold is actually per pod/target.
| # threshold is the per-pod metric value that triggers scaling | |
| # threshold is the total metric value across all pods that triggers scaling |
| - port: "" | ||
| targetPort: 8000 | ||
| path: /metrics | ||
| interval: 15s | ||
| - port: "" | ||
| targetPort: 30000 |
There was a problem hiding this comment.
In this PodMonitor example, podMetricsEndpoints[].port is set to an empty string. Prometheus Operator will treat this as a named port ("") and it can fail validation/scraping; if you intend to scrape by number, omit port entirely and keep only targetPort (or set a real named port).
| - port: "" | |
| targetPort: 8000 | |
| path: /metrics | |
| interval: 15s | |
| - port: "" | |
| targetPort: 30000 | |
| - targetPort: 8000 | |
| path: /metrics | |
| interval: 15s | |
| - targetPort: 30000 |
| triggers: | ||
| - type: prometheus | ||
| metadata: | ||
| serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090 |
There was a problem hiding this comment.
serverAddress is set to http://prometheus-kube-prometheus-prometheus.monitoring:9090. In Kubernetes DNS, service.namespace without .svc is not reliably resolvable across namespaces due to search path behavior; use the full service DNS name (e.g., ...monitoring.svc:9090 or ...monitoring.svc.cluster.local:9090) to avoid runtime resolution failures for the KEDA scaler.
| serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090 | |
| serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090 |
|
hey @LiZhenCheng9527 @hzxuzhonghu |
LiZhenCheng9527
left a comment
There was a problem hiding this comment.
If I don’t want to install Prometheus, what Helm command should I use to install Kthena?
Prometheus isn’t required to install Kthena. All the monitoring and autoscaling pieces (ServiceMonitor, PodMonitor, KEDA) are disabled by default, so a normal install works fine: helm install kthena ./charts/kthena Those resources are only created if someone explicitly enables them, so it won’t cause any issues even if Prometheus isn’t present. You can also explicitly turn them off with flags, but that’s already the default. |
hzxuzhonghu
left a comment
There was a problem hiding this comment.
I would suggest we add scaling as a separate addon, not to couple with our own components.
thats a good point thanks for the suggestion right now its added as an optional feature and disabled by default but i agree it would be cleaner to keep scaling as a separate addon i can refactor it accordingly do you have a preferred way to structure it (e.g., separate subchart or addons directory)? |
Add Prometheus monitoring + KEDA autoscaling prototype for Kthena
Summary
This PR introduces a prototype for Prometheus-based monitoring and KEDA-driven autoscaling in Kthena.
It enables scraping router and inference metrics and demonstrates scaling of workloads based on real-time request load.
This is an initial step toward integrating Prometheus + KEDA with ModelServing.
Changes
ServiceMonitor (kthena-router)
/metricsfrom the router service (port: http)PodMonitor (inference pods)
modelserving.volcano.sh/entry: "true"KEDA ScaledObject
values.yaml updates
metricsandautoscalingsectionsTesting
Tested locally on a KIND cluster:
This validates the end-to-end flow:
Router → Prometheus → KEDA → HPA → workload scaling
Notes / Limitations
Next Steps
ref #799