Add Prometheus monitoring and KEDA autoscaling support by WHOIM1205 · Pull Request #836 · volcano-sh/kthena

WHOIM1205 · 2026-03-24T21:10:43Z

Add Prometheus monitoring + KEDA autoscaling prototype for Kthena

Summary

This PR introduces a prototype for Prometheus-based monitoring and KEDA-driven autoscaling in Kthena.

It enables scraping router and inference metrics and demonstrates scaling of workloads based on real-time request load.

This is an initial step toward integrating Prometheus + KEDA with ModelServing.

Changes

ServiceMonitor (kthena-router)
- Scrapes /metrics from the router service (port: http)
- Supports cross-namespace scraping
PodMonitor (inference pods)
- Targets entry pods using modelserving.volcano.sh/entry: "true"
- Supports vLLM (8000) and SGLang (30000) metrics ports
KEDA ScaledObject
- Uses Prometheus trigger:
```
sum(kthena_router_active_downstream_requests)
```
- Demonstrates scaling based on active request load
- Configurable threshold, min/max replicas, and cooldown
values.yaml updates
- Adds metrics and autoscaling sections
- Features are disabled by default (no impact on existing deployments)

Testing

Tested locally on a KIND cluster:

Prometheus successfully scrapes kthena-router metrics via ServiceMonitor
Verified metrics via PromQL queries
KEDA consumes metrics from Prometheus
HPA is created automatically by KEDA
Observed:
- Scale-up when metric exceeds threshold
- Scale-down when metric falls below threshold

This validates the end-to-end flow:

Router → Prometheus → KEDA → HPA → workload scaling

Notes / Limitations

This is a prototype and does not yet fully integrate with ModelServing resources
Label mapping between metrics and ModelServing may require refinement
Inference pod metrics ports are backend-specific (vLLM/SGLang)

Next Steps

Integrate scaling directly with ModelServing CRD
Improve metric-to-resource mapping
Add Helm-level configurability for monitoring components

ref #799

Add standalone manifests and Helm chart integration for: - ServiceMonitor for kthena-router - PodMonitor for inference pods (vLLM/SGLang) - KEDA ScaledObject for request-based autoscaling All resources are disabled by default and fully configurable via values.yaml. Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>

volcano-sh-bot · 2026-03-24T21:10:51Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-03-24T21:11:03Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a prototype for integrating Prometheus-based monitoring and KEDA-driven autoscaling capabilities into Kthena. It establishes the necessary Kubernetes resources and Helm chart configurations to enable scraping of router and inference metrics, and demonstrates dynamic scaling of workloads in response to real-time request load. This foundational work aims to provide a robust mechanism for observability and elasticity within the ModelServing ecosystem.

Highlights

Prometheus ServiceMonitor for kthena-router: A ServiceMonitor was added to enable Prometheus to scrape metrics from the kthena-router service, supporting cross-namespace scraping.
Prometheus PodMonitor for inference pods: A PodMonitor was introduced to target and scrape metrics from inference pods, specifically supporting vLLM (port 8000) and SGLang (port 30000) metrics ports.
KEDA ScaledObject for autoscaling: A KEDA ScaledObject was implemented to demonstrate autoscaling of workloads based on active request load, using a Prometheus trigger with a configurable query, threshold, min/max replicas, and cooldown period.
Helm Chart Configuration: The values.yaml files were updated to include new metrics and autoscaling sections, allowing these features to be enabled and configured, though they are disabled by default.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces Prometheus monitoring (ServiceMonitor, PodMonitor) and KEDA autoscaling (ScaledObject) for the kthena-router and kthena-inference components, along with their respective Helm chart configurations and a standalone monitoring manifest. Feedback indicates that the scaleTargetName for KEDA autoscaling should be explicitly required or validated to prevent invalid configurations. Additionally, resource naming in Helm templates should leverage the kthena.fullname helper for consistency, and podMetricsEndpoints should be configurable via values.yaml for increased flexibility. The standalone monitoring.yaml manifest should include explicit namespace definitions for all resources and use the full FQDN for the Prometheus server address for robustness.

gemini-code-assist · 2026-03-24T21:13:19Z

charts/kthena/charts/networking/templates/kthena-router/component/scaledobject.yaml

@@ -0,0 +1,24 @@
+{{- if .Values.kthenaRouter.autoscaling.enabled }}


If autoscaling.enabled is true, scaleTargetName must be provided. An empty scaleTargetName will result in an invalid ScaledObject. It's good practice to enforce this requirement using Helm's required function to fail fast during template rendering.

{{- if .Values.kthenaRouter.autoscaling.enabled }} {{- required "A value for kthenaRouter.autoscaling.scaleTargetName is required when autoscaling is enabled" .Values.kthenaRouter.autoscaling.scaleTargetName }}

gemini-code-assist · 2026-03-24T21:13:19Z

charts/kthena/charts/networking/values.yaml

+    # scaleTargetName is the name of the Deployment to scale
+    scaleTargetName: ""


The scaleTargetName is a critical value when autoscaling is enabled, but it defaults to an empty string. This will cause the ScaledObject to be invalid. It's important to make it clear to the user that this value must be set. Adding a note to the comment and using a required check in the template would improve usability.

# scaleTargetName is the name of the Deployment to scale (required if autoscaling.enabled is true) scaleTargetName: ""

gemini-code-assist · 2026-03-24T21:13:19Z

charts/kthena/values.yaml

+      # -- Name of the Deployment to scale.
+      scaleTargetName: ""


The scaleTargetName is a critical value when autoscaling is enabled, but it defaults to an empty string. This will cause the ScaledObject to be invalid. It's important to make it clear to the user that this value must be set.

# -- Name of the Deployment to scale (required if autoscaling.enabled is true). scaleTargetName: ""

gemini-code-assist · 2026-03-24T21:13:19Z

monitoring.yaml

+metadata:
+  name: kthena-router
+  labels:
+    app.kubernetes.io/component: kthena-router


The Kubernetes resources in this manifest are not namespaced. It's a best practice to always define a namespace in the metadata for all resources to avoid deploying them into the default namespace accidentally and to ensure proper isolation. Please add a namespace to this ServiceMonitor and other resources in this file.

metadata: name: kthena-router namespace: default # Or another appropriate namespace labels: app.kubernetes.io/component: kthena-router

gemini-code-assist · 2026-03-24T21:13:19Z

monitoring.yaml

+  triggers:
+    - type: prometheus
+      metadata:
+        serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090


The Prometheus server address is missing the .svc suffix. For cross-namespace service resolution, the FQDN of the service should be used, which is typically in the format <service-name>.<namespace>.svc.cluster.local. While this might work in some cluster configurations depending on DNS search paths, it's more robust to use the full or partial FQDN.

serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090

gemini-code-assist · 2026-03-24T21:13:20Z

charts/kthena/charts/networking/templates/kthena-router/component/podmonitor.yaml

+  podMetricsEndpoints:
+    - targetPort: 8000
+      path: /metrics
+      interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
+    - targetPort: 30000
+      path: /metrics
+      interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}


The podMetricsEndpoints are hardcoded, which limits flexibility for different inference server backends or custom configurations. It would be better to make these endpoints configurable in values.yaml. This would allow users to easily add or change metric endpoints without modifying the chart template. This change would require adding an endpoints list to values.yaml under kthenaRouter.metrics.podMonitor.

podMetricsEndpoints: {{- range .Values.kthenaRouter.metrics.podMonitor.endpoints }} - targetPort: {{ .targetPort }} path: {{ .path | default "/metrics" }} interval: {{ $.Values.kthenaRouter.metrics.podMonitor.interval }} {{- end }}

gemini-code-assist · 2026-03-24T21:13:20Z

charts/kthena/charts/networking/templates/kthena-router/component/scaledobject.yaml

+apiVersion: keda.sh/v1alpha1
+kind: ScaledObject
+metadata:
+  name: kthena-inference-scaler


The resource name is hardcoded. It's a good practice to use Helm's naming conventions to generate resource names. This ensures names are unique and consistent across releases. Consider using the kthena.fullname helper.

name: {{ include "kthena.fullname" . }}-inference-scaler

gemini-code-assist · 2026-03-24T21:13:20Z

charts/kthena/charts/networking/templates/kthena-router/component/servicemonitor.yaml

+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: kthena-router


The resource name is hardcoded. It's a good practice to use Helm's naming conventions to generate resource names. This ensures names are unique and consistent across releases. Consider using the kthena.fullname helper.

name: {{ include "kthena.fullname" . }}-router

gemini-code-assist · 2026-03-24T21:13:20Z

charts/kthena/charts/networking/values.yaml

+    podMonitor:
+      # enabled creates a PodMonitor for inference pods (requires Prometheus Operator)
+      enabled: false
+      # interval is the scrape interval for the PodMonitor
+      interval: 15s


The PodMonitor template has hardcoded ports for metrics scraping. To make it more flexible for different backends, consider making the metric endpoints configurable here. This change would need to be paired with an update to the podmonitor.yaml template to iterate over this list.

podMonitor: # enabled creates a PodMonitor for inference pods (requires Prometheus Operator) enabled: false # interval is the scrape interval for the PodMonitor interval: 15s # endpoints for the PodMonitor to scrape endpoints: - { targetPort: 8000, path: /metrics } - { targetPort: 30000, path: /metrics }

gemini-code-assist · 2026-03-24T21:13:20Z

charts/kthena/values.yaml

+      podMonitor:
+        # -- Enable PodMonitor for inference pods (requires Prometheus Operator).
+        enabled: false
+        # -- Scrape interval for the PodMonitor.
+        interval: 15s


The PodMonitor template has hardcoded ports for metrics scraping. To make it more flexible for different backends, consider making the metric endpoints configurable here. This would involve adding an endpoints list that can be passed to the subchart and used in the podmonitor.yaml template.

podMonitor: # -- Enable PodMonitor for inference pods (requires Prometheus Operator). enabled: false # -- Scrape interval for the PodMonitor. interval: 15s # -- Endpoints for the PodMonitor to scrape. endpoints: - { targetPort: 8000, path: /metrics } - { targetPort: 30000, path: /metrics }

Copilot

Pull request overview

Adds a Prometheus-Operator-based monitoring + KEDA autoscaling prototype for Kthena, exposing Helm-configurable ServiceMonitor/PodMonitor resources and a Prometheus-triggered ScaledObject to scale inference workloads based on router load.

Changes:

Introduces Helm templates for ServiceMonitor (router) and PodMonitor (inference pods) gated by new values flags.
Adds a Helm template for a KEDA ScaledObject using a Prometheus trigger (query + threshold) gated by new values flags.
Adds a repo-level monitoring.yaml with example ServiceMonitor/PodMonitor/ScaledObject manifests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
monitoring.yaml	Adds a non-Helm, repo-level monitoring + autoscaling manifest example.
charts/kthena/values.yaml	Adds top-level (parent chart) values for metrics + autoscaling configuration.
charts/kthena/charts/networking/values.yaml	Adds networking subchart values for metrics + autoscaling configuration.
charts/kthena/charts/networking/templates/kthena-router/component/servicemonitor.yaml	Helm template for router `ServiceMonitor`.
charts/kthena/charts/networking/templates/kthena-router/component/podmonitor.yaml	Helm template for inference `PodMonitor`.
charts/kthena/charts/networking/templates/kthena-router/component/scaledobject.yaml	Helm template for KEDA `ScaledObject` scaling a target Deployment via Prometheus.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-24T21:15:11Z

charts/kthena/charts/networking/templates/kthena-router/component/servicemonitor.yaml

+  namespace: {{ .Release.Namespace }}
+  labels:
+    app.kubernetes.io/component: kthena-router
+    release: prometheus


The release: prometheus label is hard-coded. Prometheus Operator setups often use a different label key/value (or none) for selecting ServiceMonitors, so hard-coding this can make scraping silently not work. Make this label configurable (or omit it and let users add it via additionalLabels).

Suggested change

release: prometheus

{{- $prometheusLabels := default (dict "release" "prometheus") .Values.kthenaRouter.metrics.serviceMonitor.prometheusLabels }}

{{- range $key, $value := $prometheusLabels }}

{{ $key }}: {{ $value }}

{{- end }}

Copilot · 2026-03-24T21:15:11Z

charts/kthena/charts/networking/templates/kthena-router/component/podmonitor.yaml

+    release: prometheus
+    {{- include "kthena.labels" . | nindent 4 }}


The release: prometheus label is hard-coded. Prometheus Operator setups often use a different label key/value (or none) for selecting PodMonitors, so this can prevent scraping unless users happen to match this exact label. Make this label configurable (or omit it and expose additionalLabels).

Suggested change

release: prometheus

{{- include "kthena.labels" . | nindent 4 }}

{{- include "kthena.labels" . | nindent 4 }}

{{- with .Values.kthenaRouter.metrics.podMonitor.additionalLabels }}

{{- toYaml . | nindent 4 }}

{{- end }}

Copilot · 2026-03-24T21:15:12Z

charts/kthena/charts/networking/templates/kthena-router/component/podmonitor.yaml

+  podMetricsEndpoints:
+    - targetPort: 8000
+      path: /metrics
+      interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
+    - targetPort: 30000
+      path: /metrics
+      interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}


This PodMonitor config will scrape both targetPort: 8000 and targetPort: 30000 on every matched pod. For backends that only expose one of these ports, Prometheus will continuously record scrape failures/target down, which can be noisy and impact alerting. Consider splitting into two PodMonitors selected by a backend label, or make the endpoints list configurable so users only enable the port(s) they actually expose.

Suggested change

podMetricsEndpoints:

- targetPort: 8000

path: /metrics

interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}

- targetPort: 30000

path: /metrics

interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}

podMetricsEndpoints:

{{- with .Values.kthenaRouter.metrics.podMonitor.podMetricsEndpoints }}

{{- toYaml . | nindent 4 }}

{{- else }}

- targetPort: 8000

path: /metrics

interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}

- targetPort: 30000

path: /metrics

interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}

{{- end }}

Copilot · 2026-03-24T21:15:12Z

charts/kthena/values.yaml

+      prometheusAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090
+      # -- PromQL query for scaling decisions.
+      query: sum(kthena_router_active_downstream_requests)
+      # -- Per-pod metric threshold that triggers scaling.


The comment says the threshold is a per-pod metric threshold, but the default query is sum(kthena_router_active_downstream_requests), which produces a cluster-wide total across all series. Either adjust the wording to reflect that the threshold applies to the query result, or change the default query to return a per-pod/per-target value if that’s the intended semantics.

Suggested change

# -- Per-pod metric threshold that triggers scaling.

# -- Metric threshold on the result of the PromQL query that triggers scaling.

Copilot · 2026-03-24T21:15:12Z

monitoring.yaml

+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: kthena-router
+  labels:
+    app.kubernetes.io/component: kthena-router
+spec:
+  selector:
+    matchLabels:
+      app.kubernetes.io/component: kthena-router
+  endpoints:
+    - port: http
+      path: /metrics
+      interval: 15s
+---
+apiVersion: monitoring.coreos.com/v1
+kind: PodMonitor
+metadata:
+  name: kthena-inference
+  labels:
+    app.kubernetes.io/part-of: kthena
+spec:
+  selector:
+    matchExpressions:
+      - key: modelserving.volcano.sh/name
+        operator: Exists
+      - key: modelserving.volcano.sh/group-name
+        operator: Exists
+      - key: modelserving.volcano.sh/role
+        operator: Exists
+      - key: modelserving.volcano.sh/entry
+        operator: In
+        values:
+          - "true"
+  podMetricsEndpoints:
+    - port: ""
+      targetPort: 8000
+      path: /metrics
+      interval: 15s
+    - port: ""
+      targetPort: 30000
+      path: /metrics
+      interval: 15s
+---
+apiVersion: keda.sh/v1alpha1
+kind: ScaledObject
+metadata:
+  name: kthena-inference-scaler
+  labels:
+    app.kubernetes.io/part-of: kthena
+spec:
+  minReplicaCount: 1
+  maxReplicaCount: 10
+  scaleTargetRef:
+    name: kthena-inference
+  triggers:
+    - type: prometheus
+      metadata:
+        serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
+        query: sum(kthena_router_active_downstream_requests)
+        threshold: "5"


This top-level monitoring.yaml duplicates the Helm-rendered ServiceMonitor/PodMonitor/ScaledObject resources added under charts/. That duplication is likely to drift over time (e.g., different Prometheus address, different PodMonitor endpoints), so either move this to an examples/ or docs/ location and clearly mark it as an example, or remove it in favor of the chart templates.

Suggested change

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

metadata:

name: kthena-router

labels:

app.kubernetes.io/component: kthena-router

spec:

selector:

matchLabels:

app.kubernetes.io/component: kthena-router

endpoints:

- port: http

path: /metrics

interval: 15s

---

apiVersion: monitoring.coreos.com/v1

kind: PodMonitor

metadata:

name: kthena-inference

labels:

app.kubernetes.io/part-of: kthena

spec:

selector:

matchExpressions:

- key: modelserving.volcano.sh/name

operator: Exists

- key: modelserving.volcano.sh/group-name

operator: Exists

- key: modelserving.volcano.sh/role

operator: Exists

- key: modelserving.volcano.sh/entry

operator: In

values:

- "true"

podMetricsEndpoints:

- port: ""

targetPort: 8000

path: /metrics

interval: 15s

- port: ""

targetPort: 30000

path: /metrics

interval: 15s

---

apiVersion: keda.sh/v1alpha1

kind: ScaledObject

metadata:

name: kthena-inference-scaler

labels:

app.kubernetes.io/part-of: kthena

spec:

minReplicaCount: 1

maxReplicaCount: 10

scaleTargetRef:

name: kthena-inference

triggers:

- type: prometheus

metadata:

serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090

query: sum(kthena_router_active_downstream_requests)

threshold: "5"

# Example monitoring configuration for kthena components.

#

# NOTE:

# - The authoritative ServiceMonitor, PodMonitor, and ScaledObject resources

# are rendered from the Helm charts under the `charts/` directory.

# - This file is kept only as documentation / an example manifest and should

# not be applied directly in environments that use the Helm chart.

#

# If you need to customize monitoring, prefer editing the Helm values/templates

# rather than modifying this example file.

# apiVersion: monitoring.coreos.com/v1

# kind: ServiceMonitor

# metadata:

# name: kthena-router

# labels:

# app.kubernetes.io/component: kthena-router

# spec:

# selector:

# matchLabels:

# app.kubernetes.io/component: kthena-router

# endpoints:

# - port: http

# path: /metrics

# interval: 15s

# ---

# apiVersion: monitoring.coreos.com/v1

# kind: PodMonitor

# metadata:

# name: kthena-inference

# labels:

# app.kubernetes.io/part-of: kthena

# spec:

# selector:

# matchExpressions:

# - key: modelserving.volcano.sh/name

# operator: Exists

# - key: modelserving.volcano.sh/group-name

# operator: Exists

# - key: modelserving.volcano.sh/role

# operator: Exists

# - key: modelserving.volcano.sh/entry

# operator: In

# values:

# - "true"

# podMetricsEndpoints:

# - port: ""

# targetPort: 8000

# path: /metrics

# interval: 15s

# - port: ""

# targetPort: 30000

# path: /metrics

# interval: 15s

# ---

# apiVersion: keda.sh/v1alpha1

# kind: ScaledObject

# metadata:

# name: kthena-inference-scaler

# labels:

# app.kubernetes.io/part-of: kthena

# spec:

# minReplicaCount: 1

# maxReplicaCount: 10

# scaleTargetRef:

# name: kthena-inference

# triggers:

# - type: prometheus

# metadata:

# serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090

# query: sum(kthena_router_active_downstream_requests)

# threshold: "5"

Copilot · 2026-03-24T21:15:12Z

charts/kthena/charts/networking/templates/kthena-router/component/scaledobject.yaml

+  scaleTargetRef:
+    apiVersion: apps/v1
+    kind: Deployment
+    name: {{ .Values.kthenaRouter.autoscaling.scaleTargetName }}


When autoscaling is enabled, scaleTargetName defaults to an empty string, which will render an invalid scaleTargetRef.name. Consider using Helm's required (or a conditional fail) to enforce a non-empty target name when .Values.kthenaRouter.autoscaling.enabled is true.

Suggested change

name: {{ .Values.kthenaRouter.autoscaling.scaleTargetName }}

name: {{ required "kthenaRouter.autoscaling.scaleTargetName must be set when autoscaling is enabled" .Values.kthenaRouter.autoscaling.scaleTargetName }}

Copilot · 2026-03-24T21:15:13Z

charts/kthena/charts/networking/templates/kthena-router/component/scaledobject.yaml

+        serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress }}
+        query: {{ .Values.kthenaRouter.autoscaling.query }}


serverAddress and query are rendered unquoted. PromQL expressions commonly include characters ({}, [], ,, :) that can break YAML parsing or be re-interpreted; render these values with | quote to ensure the ScaledObject is always valid YAML regardless of the configured query/address.

Suggested change

serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress }}

query: {{ .Values.kthenaRouter.autoscaling.query }}

serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress | quote }}

query: {{ .Values.kthenaRouter.autoscaling.query | quote }}

Copilot · 2026-03-24T21:15:13Z

charts/kthena/charts/networking/values.yaml

+    prometheusAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090
+    # query is the PromQL query used for scaling decisions
+    query: sum(kthena_router_active_downstream_requests)
+    # threshold is the per-pod metric value that triggers scaling


The comment describes threshold as a per-pod metric value, but the default query is sum(kthena_router_active_downstream_requests) (a total across all matched series). Update the comment to match the query semantics or adjust the default query so the threshold is actually per pod/target.

Suggested change

# threshold is the per-pod metric value that triggers scaling

# threshold is the total metric value across all pods that triggers scaling

Copilot · 2026-03-24T21:15:13Z

monitoring.yaml

+    - port: ""
+      targetPort: 8000
+      path: /metrics
+      interval: 15s
+    - port: ""
+      targetPort: 30000


In this PodMonitor example, podMetricsEndpoints[].port is set to an empty string. Prometheus Operator will treat this as a named port ("") and it can fail validation/scraping; if you intend to scrape by number, omit port entirely and keep only targetPort (or set a real named port).

Suggested change

- port: ""

targetPort: 8000

path: /metrics

interval: 15s

- port: ""

targetPort: 30000

- targetPort: 8000

path: /metrics

interval: 15s

- targetPort: 30000

Copilot · 2026-03-24T21:15:13Z

monitoring.yaml

+  triggers:
+    - type: prometheus
+      metadata:
+        serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090


serverAddress is set to http://prometheus-kube-prometheus-prometheus.monitoring:9090. In Kubernetes DNS, service.namespace without .svc is not reliably resolvable across namespaces due to search path behavior; use the full service DNS name (e.g., ...monitoring.svc:9090 or ...monitoring.svc.cluster.local:9090) to avoid runtime resolution failures for the KEDA scaler.

Suggested change

serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090

serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090

WHOIM1205 · 2026-03-24T21:26:05Z

hey @LiZhenCheng9527 @hzxuzhonghu
I’ve shared an update above and opened a PR with a working prototype for Prometheus + KEDA autoscaling.
would love to get your thoughts whenever you get a chance. thanks

LiZhenCheng9527

If I don’t want to install Prometheus, what Helm command should I use to install Kthena?

WHOIM1205 · 2026-03-30T11:23:24Z

If I don’t want to install Prometheus, what Helm command should I use to install Kthena?

Prometheus isn’t required to install Kthena.

All the monitoring and autoscaling pieces (ServiceMonitor, PodMonitor, KEDA) are disabled by default, so a normal install works fine:

helm install kthena ./charts/kthena

Those resources are only created if someone explicitly enables them, so it won’t cause any issues even if Prometheus isn’t present.

You can also explicitly turn them off with flags, but that’s already the default.

hzxuzhonghu

I would suggest we add scaling as a separate addon, not to couple with our own components.

WHOIM1205 · 2026-03-31T19:24:43Z

I would suggest we add scaling as a separate addon, not to couple with our own components.

thats a good point thanks for the suggestion

right now its added as an optional feature and disabled by default but i agree it would be cleaner to keep scaling as a separate addon

i can refactor it accordingly do you have a preferred way to structure it (e.g., separate subchart or addons directory)?

Copilot AI review requested due to automatic review settings March 24, 2026 21:10

volcano-sh-bot requested review from YaoZengzeng and hzxuzhonghu March 24, 2026 21:10

volcano-sh-bot added the size/L label Mar 24, 2026

Copilot started reviewing on behalf of WHOIM1205 March 24, 2026 21:11 View session

gemini-code-assist bot reviewed Mar 24, 2026

View reviewed changes

Copilot AI reviewed Mar 24, 2026

View reviewed changes

LiZhenCheng9527 reviewed Mar 30, 2026

View reviewed changes

hzxuzhonghu reviewed Mar 31, 2026

View reviewed changes

This was referenced Apr 2, 2026

Understanding current Kthena autoscaler for HPA/KEDA integration #799

Open

Proposal: Prometheus + KEDA autoscaling for ModelServing #868

Open

		@@ -0,0 +1,24 @@
		{{- if .Values.kthenaRouter.autoscaling.enabled }}

		# scaleTargetName is the name of the Deployment to scale
		scaleTargetName: ""

-    release: prometheus
+    {{- $prometheusLabels := default (dict "release" "prometheus") .Values.kthenaRouter.metrics.serviceMonitor.prometheusLabels }}
+    {{- range $key, $value := $prometheusLabels }}
+    {{ $key }}: {{ $value }}
+    {{- end }}

		release: prometheus
		{{- include "kthena.labels" . \| nindent 4 }}

	# -- Per-pod metric threshold that triggers scaling.
	# -- Metric threshold on the result of the PromQL query that triggers scaling.

	name: {{ .Values.kthenaRouter.autoscaling.scaleTargetName }}
	name: {{ required "kthenaRouter.autoscaling.scaleTargetName must be set when autoscaling is enabled" .Values.kthenaRouter.autoscaling.scaleTargetName }}

		serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress }}
		query: {{ .Values.kthenaRouter.autoscaling.query }}

	# threshold is the per-pod metric value that triggers scaling
	# threshold is the total metric value across all pods that triggers scaling

	serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
	serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090

Conversation

WHOIM1205 commented Mar 24, 2026

Add Prometheus monitoring + KEDA autoscaling prototype for Kthena

Summary

Changes

Testing

Notes / Limitations

Next Steps

Uh oh!

volcano-sh-bot commented Mar 24, 2026

Uh oh!

gemini-code-assist bot commented Mar 24, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 24, 2026