Skip to content

Add Prometheus monitoring and KEDA autoscaling support#836

Open
WHOIM1205 wants to merge 1 commit intovolcano-sh:mainfrom
WHOIM1205:monitoring-and-autoscaling
Open

Add Prometheus monitoring and KEDA autoscaling support#836
WHOIM1205 wants to merge 1 commit intovolcano-sh:mainfrom
WHOIM1205:monitoring-and-autoscaling

Conversation

@WHOIM1205
Copy link
Copy Markdown
Contributor

Add Prometheus monitoring + KEDA autoscaling prototype for Kthena

Summary

This PR introduces a prototype for Prometheus-based monitoring and KEDA-driven autoscaling in Kthena.

It enables scraping router and inference metrics and demonstrates scaling of workloads based on real-time request load.

This is an initial step toward integrating Prometheus + KEDA with ModelServing.


Changes

  • ServiceMonitor (kthena-router)

    • Scrapes /metrics from the router service (port: http)
    • Supports cross-namespace scraping
  • PodMonitor (inference pods)

    • Targets entry pods using modelserving.volcano.sh/entry: "true"
    • Supports vLLM (8000) and SGLang (30000) metrics ports
  • KEDA ScaledObject

    • Uses Prometheus trigger:
      sum(kthena_router_active_downstream_requests)
      
    • Demonstrates scaling based on active request load
    • Configurable threshold, min/max replicas, and cooldown
  • values.yaml updates

    • Adds metrics and autoscaling sections
    • Features are disabled by default (no impact on existing deployments)

Testing

Tested locally on a KIND cluster:

  • Prometheus successfully scrapes kthena-router metrics via ServiceMonitor
  • Verified metrics via PromQL queries
  • KEDA consumes metrics from Prometheus
  • HPA is created automatically by KEDA
  • Observed:
    • Scale-up when metric exceeds threshold
    • Scale-down when metric falls below threshold

This validates the end-to-end flow:

Router → Prometheus → KEDA → HPA → workload scaling

Screenshot 2026-03-25 023732 Screenshot 2026-03-25 023759 Screenshot 2026-03-25 023819

Notes / Limitations

  • This is a prototype and does not yet fully integrate with ModelServing resources
  • Label mapping between metrics and ModelServing may require refinement
  • Inference pod metrics ports are backend-specific (vLLM/SGLang)

Next Steps

  • Integrate scaling directly with ModelServing CRD
  • Improve metric-to-resource mapping
  • Add Helm-level configurability for monitoring components

ref #799

Add standalone manifests and Helm chart integration for:
- ServiceMonitor for kthena-router
- PodMonitor for inference pods (vLLM/SGLang)
- KEDA ScaledObject for request-based autoscaling

All resources are disabled by default and fully configurable via values.yaml.

Signed-off-by: WHOIM1205 <rathourprateek8@gmail.com>
Copilot AI review requested due to automatic review settings March 24, 2026 21:10
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a prototype for integrating Prometheus-based monitoring and KEDA-driven autoscaling capabilities into Kthena. It establishes the necessary Kubernetes resources and Helm chart configurations to enable scraping of router and inference metrics, and demonstrates dynamic scaling of workloads in response to real-time request load. This foundational work aims to provide a robust mechanism for observability and elasticity within the ModelServing ecosystem.

Highlights

  • Prometheus ServiceMonitor for kthena-router: A ServiceMonitor was added to enable Prometheus to scrape metrics from the kthena-router service, supporting cross-namespace scraping.
  • Prometheus PodMonitor for inference pods: A PodMonitor was introduced to target and scrape metrics from inference pods, specifically supporting vLLM (port 8000) and SGLang (port 30000) metrics ports.
  • KEDA ScaledObject for autoscaling: A KEDA ScaledObject was implemented to demonstrate autoscaling of workloads based on active request load, using a Prometheus trigger with a configurable query, threshold, min/max replicas, and cooldown period.
  • Helm Chart Configuration: The values.yaml files were updated to include new metrics and autoscaling sections, allowing these features to be enabled and configured, though they are disabled by default.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces Prometheus monitoring (ServiceMonitor, PodMonitor) and KEDA autoscaling (ScaledObject) for the kthena-router and kthena-inference components, along with their respective Helm chart configurations and a standalone monitoring manifest. Feedback indicates that the scaleTargetName for KEDA autoscaling should be explicitly required or validated to prevent invalid configurations. Additionally, resource naming in Helm templates should leverage the kthena.fullname helper for consistency, and podMetricsEndpoints should be configurable via values.yaml for increased flexibility. The standalone monitoring.yaml manifest should include explicit namespace definitions for all resources and use the full FQDN for the Prometheus server address for robustness.

@@ -0,0 +1,24 @@
{{- if .Values.kthenaRouter.autoscaling.enabled }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If autoscaling.enabled is true, scaleTargetName must be provided. An empty scaleTargetName will result in an invalid ScaledObject. It's good practice to enforce this requirement using Helm's required function to fail fast during template rendering.

{{- if .Values.kthenaRouter.autoscaling.enabled }}
{{- required "A value for kthenaRouter.autoscaling.scaleTargetName is required when autoscaling is enabled" .Values.kthenaRouter.autoscaling.scaleTargetName }}

Comment on lines +85 to +86
# scaleTargetName is the name of the Deployment to scale
scaleTargetName: ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The scaleTargetName is a critical value when autoscaling is enabled, but it defaults to an empty string. This will cause the ScaledObject to be invalid. It's important to make it clear to the user that this value must be set. Adding a note to the comment and using a required check in the template would improve usability.

    # scaleTargetName is the name of the Deployment to scale (required if autoscaling.enabled is true)
    scaleTargetName: ""

Comment on lines +98 to +99
# -- Name of the Deployment to scale.
scaleTargetName: ""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The scaleTargetName is a critical value when autoscaling is enabled, but it defaults to an empty string. This will cause the ScaledObject to be invalid. It's important to make it clear to the user that this value must be set.

      # -- Name of the Deployment to scale (required if autoscaling.enabled is true).
      scaleTargetName: ""

Comment on lines +3 to +6
metadata:
name: kthena-router
labels:
app.kubernetes.io/component: kthena-router
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Kubernetes resources in this manifest are not namespaced. It's a best practice to always define a namespace in the metadata for all resources to avoid deploying them into the default namespace accidentally and to ensure proper isolation. Please add a namespace to this ServiceMonitor and other resources in this file.

metadata:
  name: kthena-router
  namespace: default # Or another appropriate namespace
  labels:
    app.kubernetes.io/component: kthena-router

triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Prometheus server address is missing the .svc suffix. For cross-namespace service resolution, the FQDN of the service should be used, which is typically in the format <service-name>.<namespace>.svc.cluster.local. While this might work in some cluster configurations depending on DNS search paths, it's more robust to use the full or partial FQDN.

        serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090

Comment on lines +27 to +33
podMetricsEndpoints:
- targetPort: 8000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
- targetPort: 30000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The podMetricsEndpoints are hardcoded, which limits flexibility for different inference server backends or custom configurations. It would be better to make these endpoints configurable in values.yaml. This would allow users to easily add or change metric endpoints without modifying the chart template. This change would require adding an endpoints list to values.yaml under kthenaRouter.metrics.podMonitor.

  podMetricsEndpoints:
  {{- range .Values.kthenaRouter.metrics.podMonitor.endpoints }}
    - targetPort: {{ .targetPort }}
      path: {{ .path | default "/metrics" }}
      interval: {{ $.Values.kthenaRouter.metrics.podMonitor.interval }}
  {{- end }}

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kthena-inference-scaler
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The resource name is hardcoded. It's a good practice to use Helm's naming conventions to generate resource names. This ensures names are unique and consistent across releases. Consider using the kthena.fullname helper.

  name: {{ include "kthena.fullname" . }}-inference-scaler

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kthena-router
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The resource name is hardcoded. It's a good practice to use Helm's naming conventions to generate resource names. This ensures names are unique and consistent across releases. Consider using the kthena.fullname helper.

  name: {{ include "kthena.fullname" . }}-router

Comment on lines +76 to +80
podMonitor:
# enabled creates a PodMonitor for inference pods (requires Prometheus Operator)
enabled: false
# interval is the scrape interval for the PodMonitor
interval: 15s
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PodMonitor template has hardcoded ports for metrics scraping. To make it more flexible for different backends, consider making the metric endpoints configurable here. This change would need to be paired with an update to the podmonitor.yaml template to iterate over this list.

    podMonitor:
      # enabled creates a PodMonitor for inference pods (requires Prometheus Operator)
      enabled: false
      # interval is the scrape interval for the PodMonitor
      interval: 15s
      # endpoints for the PodMonitor to scrape
      endpoints:
        - { targetPort: 8000, path: /metrics }
        - { targetPort: 30000, path: /metrics }

Comment on lines +90 to +94
podMonitor:
# -- Enable PodMonitor for inference pods (requires Prometheus Operator).
enabled: false
# -- Scrape interval for the PodMonitor.
interval: 15s
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PodMonitor template has hardcoded ports for metrics scraping. To make it more flexible for different backends, consider making the metric endpoints configurable here. This would involve adding an endpoints list that can be passed to the subchart and used in the podmonitor.yaml template.

      podMonitor:
        # -- Enable PodMonitor for inference pods (requires Prometheus Operator).
        enabled: false
        # -- Scrape interval for the PodMonitor.
        interval: 15s
        # -- Endpoints for the PodMonitor to scrape.
        endpoints:
          - { targetPort: 8000, path: /metrics }
          - { targetPort: 30000, path: /metrics }

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a Prometheus-Operator-based monitoring + KEDA autoscaling prototype for Kthena, exposing Helm-configurable ServiceMonitor/PodMonitor resources and a Prometheus-triggered ScaledObject to scale inference workloads based on router load.

Changes:

  • Introduces Helm templates for ServiceMonitor (router) and PodMonitor (inference pods) gated by new values flags.
  • Adds a Helm template for a KEDA ScaledObject using a Prometheus trigger (query + threshold) gated by new values flags.
  • Adds a repo-level monitoring.yaml with example ServiceMonitor/PodMonitor/ScaledObject manifests.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
monitoring.yaml Adds a non-Helm, repo-level monitoring + autoscaling manifest example.
charts/kthena/values.yaml Adds top-level (parent chart) values for metrics + autoscaling configuration.
charts/kthena/charts/networking/values.yaml Adds networking subchart values for metrics + autoscaling configuration.
charts/kthena/charts/networking/templates/kthena-router/component/servicemonitor.yaml Helm template for router ServiceMonitor.
charts/kthena/charts/networking/templates/kthena-router/component/podmonitor.yaml Helm template for inference PodMonitor.
charts/kthena/charts/networking/templates/kthena-router/component/scaledobject.yaml Helm template for KEDA ScaledObject scaling a target Deployment via Prometheus.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

namespace: {{ .Release.Namespace }}
labels:
app.kubernetes.io/component: kthena-router
release: prometheus
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release: prometheus label is hard-coded. Prometheus Operator setups often use a different label key/value (or none) for selecting ServiceMonitors, so hard-coding this can make scraping silently not work. Make this label configurable (or omit it and let users add it via additionalLabels).

Suggested change
release: prometheus
{{- $prometheusLabels := default (dict "release" "prometheus") .Values.kthenaRouter.metrics.serviceMonitor.prometheusLabels }}
{{- range $key, $value := $prometheusLabels }}
{{ $key }}: {{ $value }}
{{- end }}

Copilot uses AI. Check for mistakes.
Comment on lines +9 to +10
release: prometheus
{{- include "kthena.labels" . | nindent 4 }}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release: prometheus label is hard-coded. Prometheus Operator setups often use a different label key/value (or none) for selecting PodMonitors, so this can prevent scraping unless users happen to match this exact label. Make this label configurable (or omit it and expose additionalLabels).

Suggested change
release: prometheus
{{- include "kthena.labels" . | nindent 4 }}
{{- include "kthena.labels" . | nindent 4 }}
{{- with .Values.kthenaRouter.metrics.podMonitor.additionalLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}

Copilot uses AI. Check for mistakes.
Comment on lines +27 to +33
podMetricsEndpoints:
- targetPort: 8000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
- targetPort: 30000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PodMonitor config will scrape both targetPort: 8000 and targetPort: 30000 on every matched pod. For backends that only expose one of these ports, Prometheus will continuously record scrape failures/target down, which can be noisy and impact alerting. Consider splitting into two PodMonitors selected by a backend label, or make the endpoints list configurable so users only enable the port(s) they actually expose.

Suggested change
podMetricsEndpoints:
- targetPort: 8000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
- targetPort: 30000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
podMetricsEndpoints:
{{- with .Values.kthenaRouter.metrics.podMonitor.podMetricsEndpoints }}
{{- toYaml . | nindent 4 }}
{{- else }}
- targetPort: 8000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
- targetPort: 30000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
{{- end }}

Copilot uses AI. Check for mistakes.
prometheusAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090
# -- PromQL query for scaling decisions.
query: sum(kthena_router_active_downstream_requests)
# -- Per-pod metric threshold that triggers scaling.
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the threshold is a per-pod metric threshold, but the default query is sum(kthena_router_active_downstream_requests), which produces a cluster-wide total across all series. Either adjust the wording to reflect that the threshold applies to the query result, or change the default query to return a per-pod/per-target value if that’s the intended semantics.

Suggested change
# -- Per-pod metric threshold that triggers scaling.
# -- Metric threshold on the result of the PromQL query that triggers scaling.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +61
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kthena-router
labels:
app.kubernetes.io/component: kthena-router
spec:
selector:
matchLabels:
app.kubernetes.io/component: kthena-router
endpoints:
- port: http
path: /metrics
interval: 15s
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kthena-inference
labels:
app.kubernetes.io/part-of: kthena
spec:
selector:
matchExpressions:
- key: modelserving.volcano.sh/name
operator: Exists
- key: modelserving.volcano.sh/group-name
operator: Exists
- key: modelserving.volcano.sh/role
operator: Exists
- key: modelserving.volcano.sh/entry
operator: In
values:
- "true"
podMetricsEndpoints:
- port: ""
targetPort: 8000
path: /metrics
interval: 15s
- port: ""
targetPort: 30000
path: /metrics
interval: 15s
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kthena-inference-scaler
labels:
app.kubernetes.io/part-of: kthena
spec:
minReplicaCount: 1
maxReplicaCount: 10
scaleTargetRef:
name: kthena-inference
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
query: sum(kthena_router_active_downstream_requests)
threshold: "5"
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This top-level monitoring.yaml duplicates the Helm-rendered ServiceMonitor/PodMonitor/ScaledObject resources added under charts/. That duplication is likely to drift over time (e.g., different Prometheus address, different PodMonitor endpoints), so either move this to an examples/ or docs/ location and clearly mark it as an example, or remove it in favor of the chart templates.

Suggested change
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kthena-router
labels:
app.kubernetes.io/component: kthena-router
spec:
selector:
matchLabels:
app.kubernetes.io/component: kthena-router
endpoints:
- port: http
path: /metrics
interval: 15s
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kthena-inference
labels:
app.kubernetes.io/part-of: kthena
spec:
selector:
matchExpressions:
- key: modelserving.volcano.sh/name
operator: Exists
- key: modelserving.volcano.sh/group-name
operator: Exists
- key: modelserving.volcano.sh/role
operator: Exists
- key: modelserving.volcano.sh/entry
operator: In
values:
- "true"
podMetricsEndpoints:
- port: ""
targetPort: 8000
path: /metrics
interval: 15s
- port: ""
targetPort: 30000
path: /metrics
interval: 15s
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kthena-inference-scaler
labels:
app.kubernetes.io/part-of: kthena
spec:
minReplicaCount: 1
maxReplicaCount: 10
scaleTargetRef:
name: kthena-inference
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
query: sum(kthena_router_active_downstream_requests)
threshold: "5"
# Example monitoring configuration for kthena components.
#
# NOTE:
# - The authoritative ServiceMonitor, PodMonitor, and ScaledObject resources
# are rendered from the Helm charts under the `charts/` directory.
# - This file is kept only as documentation / an example manifest and should
# not be applied directly in environments that use the Helm chart.
#
# If you need to customize monitoring, prefer editing the Helm values/templates
# rather than modifying this example file.
# apiVersion: monitoring.coreos.com/v1
# kind: ServiceMonitor
# metadata:
# name: kthena-router
# labels:
# app.kubernetes.io/component: kthena-router
# spec:
# selector:
# matchLabels:
# app.kubernetes.io/component: kthena-router
# endpoints:
# - port: http
# path: /metrics
# interval: 15s
# ---
# apiVersion: monitoring.coreos.com/v1
# kind: PodMonitor
# metadata:
# name: kthena-inference
# labels:
# app.kubernetes.io/part-of: kthena
# spec:
# selector:
# matchExpressions:
# - key: modelserving.volcano.sh/name
# operator: Exists
# - key: modelserving.volcano.sh/group-name
# operator: Exists
# - key: modelserving.volcano.sh/role
# operator: Exists
# - key: modelserving.volcano.sh/entry
# operator: In
# values:
# - "true"
# podMetricsEndpoints:
# - port: ""
# targetPort: 8000
# path: /metrics
# interval: 15s
# - port: ""
# targetPort: 30000
# path: /metrics
# interval: 15s
# ---
# apiVersion: keda.sh/v1alpha1
# kind: ScaledObject
# metadata:
# name: kthena-inference-scaler
# labels:
# app.kubernetes.io/part-of: kthena
# spec:
# minReplicaCount: 1
# maxReplicaCount: 10
# scaleTargetRef:
# name: kthena-inference
# triggers:
# - type: prometheus
# metadata:
# serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
# query: sum(kthena_router_active_downstream_requests)
# threshold: "5"

Copilot uses AI. Check for mistakes.
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ .Values.kthenaRouter.autoscaling.scaleTargetName }}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When autoscaling is enabled, scaleTargetName defaults to an empty string, which will render an invalid scaleTargetRef.name. Consider using Helm's required (or a conditional fail) to enforce a non-empty target name when .Values.kthenaRouter.autoscaling.enabled is true.

Suggested change
name: {{ .Values.kthenaRouter.autoscaling.scaleTargetName }}
name: {{ required "kthenaRouter.autoscaling.scaleTargetName must be set when autoscaling is enabled" .Values.kthenaRouter.autoscaling.scaleTargetName }}

Copilot uses AI. Check for mistakes.
Comment on lines +21 to +22
serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress }}
query: {{ .Values.kthenaRouter.autoscaling.query }}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serverAddress and query are rendered unquoted. PromQL expressions commonly include characters ({}, [], ,, :) that can break YAML parsing or be re-interpreted; render these values with | quote to ensure the ScaledObject is always valid YAML regardless of the configured query/address.

Suggested change
serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress }}
query: {{ .Values.kthenaRouter.autoscaling.query }}
serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress | quote }}
query: {{ .Values.kthenaRouter.autoscaling.query | quote }}

Copilot uses AI. Check for mistakes.
prometheusAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090
# query is the PromQL query used for scaling decisions
query: sum(kthena_router_active_downstream_requests)
# threshold is the per-pod metric value that triggers scaling
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment describes threshold as a per-pod metric value, but the default query is sum(kthena_router_active_downstream_requests) (a total across all matched series). Update the comment to match the query semantics or adjust the default query so the threshold is actually per pod/target.

Suggested change
# threshold is the per-pod metric value that triggers scaling
# threshold is the total metric value across all pods that triggers scaling

Copilot uses AI. Check for mistakes.
Comment on lines +36 to +41
- port: ""
targetPort: 8000
path: /metrics
interval: 15s
- port: ""
targetPort: 30000
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PodMonitor example, podMetricsEndpoints[].port is set to an empty string. Prometheus Operator will treat this as a named port ("") and it can fail validation/scraping; if you intend to scrape by number, omit port entirely and keep only targetPort (or set a real named port).

Suggested change
- port: ""
targetPort: 8000
path: /metrics
interval: 15s
- port: ""
targetPort: 30000
- targetPort: 8000
path: /metrics
interval: 15s
- targetPort: 30000

Copilot uses AI. Check for mistakes.
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serverAddress is set to http://prometheus-kube-prometheus-prometheus.monitoring:9090. In Kubernetes DNS, service.namespace without .svc is not reliably resolvable across namespaces due to search path behavior; use the full service DNS name (e.g., ...monitoring.svc:9090 or ...monitoring.svc.cluster.local:9090) to avoid runtime resolution failures for the KEDA scaler.

Suggested change
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090

Copilot uses AI. Check for mistakes.
@WHOIM1205
Copy link
Copy Markdown
Contributor Author

hey @LiZhenCheng9527 @hzxuzhonghu
I’ve shared an update above and opened a PR with a working prototype for Prometheus + KEDA autoscaling.
would love to get your thoughts whenever you get a chance. thanks

Copy link
Copy Markdown
Contributor

@LiZhenCheng9527 LiZhenCheng9527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don’t want to install Prometheus, what Helm command should I use to install Kthena?

@WHOIM1205
Copy link
Copy Markdown
Contributor Author

If I don’t want to install Prometheus, what Helm command should I use to install Kthena?

Prometheus isn’t required to install Kthena.

All the monitoring and autoscaling pieces (ServiceMonitor, PodMonitor, KEDA) are disabled by default, so a normal install works fine:

helm install kthena ./charts/kthena

Those resources are only created if someone explicitly enables them, so it won’t cause any issues even if Prometheus isn’t present.

You can also explicitly turn them off with flags, but that’s already the default.

Copy link
Copy Markdown
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we add scaling as a separate addon, not to couple with our own components.

@WHOIM1205
Copy link
Copy Markdown
Contributor Author

I would suggest we add scaling as a separate addon, not to couple with our own components.

thats a good point thanks for the suggestion

right now its added as an optional feature and disabled by default but i agree it would be cleaner to keep scaling as a separate addon

i can refactor it accordingly do you have a preferred way to structure it (e.g., separate subchart or addons directory)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants