Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
{{- if .Values.kthenaRouter.metrics.podMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kthena-inference
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The resource name is hardcoded. It's a good practice to use Helm's naming conventions to generate resource names. This ensures names are unique and consistent across releases. Consider using the kthena.fullname helper.

  name: {{ include "kthena.fullname" . }}-inference-podmonitor

namespace: {{ .Release.Namespace }}
labels:
app.kubernetes.io/component: kthena-inference
release: prometheus
{{- include "kthena.labels" . | nindent 4 }}
Comment on lines +9 to +10
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release: prometheus label is hard-coded. Prometheus Operator setups often use a different label key/value (or none) for selecting PodMonitors, so this can prevent scraping unless users happen to match this exact label. Make this label configurable (or omit it and expose additionalLabels).

Suggested change
release: prometheus
{{- include "kthena.labels" . | nindent 4 }}
{{- include "kthena.labels" . | nindent 4 }}
{{- with .Values.kthenaRouter.metrics.podMonitor.additionalLabels }}
{{- toYaml . | nindent 4 }}
{{- end }}

Copilot uses AI. Check for mistakes.
spec:
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
selector:
matchExpressions:
- key: modelserving.volcano.sh/name
operator: Exists
- key: modelserving.volcano.sh/group-name
operator: Exists
- key: modelserving.volcano.sh/role
operator: Exists
- key: modelserving.volcano.sh/entry
operator: In
values:
- "true"
podMetricsEndpoints:
- targetPort: 8000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
- targetPort: 30000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
Comment on lines +27 to +33
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The podMetricsEndpoints are hardcoded, which limits flexibility for different inference server backends or custom configurations. It would be better to make these endpoints configurable in values.yaml. This would allow users to easily add or change metric endpoints without modifying the chart template. This change would require adding an endpoints list to values.yaml under kthenaRouter.metrics.podMonitor.

  podMetricsEndpoints:
  {{- range .Values.kthenaRouter.metrics.podMonitor.endpoints }}
    - targetPort: {{ .targetPort }}
      path: {{ .path | default "/metrics" }}
      interval: {{ $.Values.kthenaRouter.metrics.podMonitor.interval }}
  {{- end }}

Comment on lines +27 to +33
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PodMonitor config will scrape both targetPort: 8000 and targetPort: 30000 on every matched pod. For backends that only expose one of these ports, Prometheus will continuously record scrape failures/target down, which can be noisy and impact alerting. Consider splitting into two PodMonitors selected by a backend label, or make the endpoints list configurable so users only enable the port(s) they actually expose.

Suggested change
podMetricsEndpoints:
- targetPort: 8000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
- targetPort: 30000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
podMetricsEndpoints:
{{- with .Values.kthenaRouter.metrics.podMonitor.podMetricsEndpoints }}
{{- toYaml . | nindent 4 }}
{{- else }}
- targetPort: 8000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
- targetPort: 30000
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.podMonitor.interval }}
{{- end }}

Copilot uses AI. Check for mistakes.
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{{- if .Values.kthenaRouter.autoscaling.enabled }}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If autoscaling.enabled is true, scaleTargetName must be provided. An empty scaleTargetName will result in an invalid ScaledObject. It's good practice to enforce this requirement using Helm's required function to fail fast during template rendering.

{{- if .Values.kthenaRouter.autoscaling.enabled }}
{{- required "A value for kthenaRouter.autoscaling.scaleTargetName is required when autoscaling is enabled" .Values.kthenaRouter.autoscaling.scaleTargetName }}

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kthena-inference-scaler
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The resource name is hardcoded. It's a good practice to use Helm's naming conventions to generate resource names. This ensures names are unique and consistent across releases. Consider using the kthena.fullname helper.

  name: {{ include "kthena.fullname" . }}-inference-scaler

namespace: {{ .Release.Namespace }}
labels:
app.kubernetes.io/component: kthena-inference
{{- include "kthena.labels" . | nindent 4 }}
spec:
cooldownPeriod: {{ .Values.kthenaRouter.autoscaling.cooldownPeriod }}
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ .Values.kthenaRouter.autoscaling.scaleTargetName }}
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When autoscaling is enabled, scaleTargetName defaults to an empty string, which will render an invalid scaleTargetRef.name. Consider using Helm's required (or a conditional fail) to enforce a non-empty target name when .Values.kthenaRouter.autoscaling.enabled is true.

Suggested change
name: {{ .Values.kthenaRouter.autoscaling.scaleTargetName }}
name: {{ required "kthenaRouter.autoscaling.scaleTargetName must be set when autoscaling is enabled" .Values.kthenaRouter.autoscaling.scaleTargetName }}

Copilot uses AI. Check for mistakes.
minReplicaCount: {{ .Values.kthenaRouter.autoscaling.minReplicas }}
maxReplicaCount: {{ .Values.kthenaRouter.autoscaling.maxReplicas }}
triggers:
- type: prometheus
metadata:
serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress }}
query: {{ .Values.kthenaRouter.autoscaling.query }}
Comment on lines +21 to +22
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serverAddress and query are rendered unquoted. PromQL expressions commonly include characters ({}, [], ,, :) that can break YAML parsing or be re-interpreted; render these values with | quote to ensure the ScaledObject is always valid YAML regardless of the configured query/address.

Suggested change
serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress }}
query: {{ .Values.kthenaRouter.autoscaling.query }}
serverAddress: {{ .Values.kthenaRouter.autoscaling.prometheusAddress | quote }}
query: {{ .Values.kthenaRouter.autoscaling.query | quote }}

Copilot uses AI. Check for mistakes.
threshold: {{ .Values.kthenaRouter.autoscaling.threshold | quote }}
{{- end }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{{- if .Values.kthenaRouter.metrics.serviceMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kthena-router
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The resource name is hardcoded. It's a good practice to use Helm's naming conventions to generate resource names. This ensures names are unique and consistent across releases. Consider using the kthena.fullname helper.

  name: {{ include "kthena.fullname" . }}-router

namespace: {{ .Release.Namespace }}
labels:
app.kubernetes.io/component: kthena-router
release: prometheus
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The release: prometheus label is hard-coded. Prometheus Operator setups often use a different label key/value (or none) for selecting ServiceMonitors, so hard-coding this can make scraping silently not work. Make this label configurable (or omit it and let users add it via additionalLabels).

Suggested change
release: prometheus
{{- $prometheusLabels := default (dict "release" "prometheus") .Values.kthenaRouter.metrics.serviceMonitor.prometheusLabels }}
{{- range $key, $value := $prometheusLabels }}
{{ $key }}: {{ $value }}
{{- end }}

Copilot uses AI. Check for mistakes.
{{- include "kthena.labels" . | nindent 4 }}
spec:
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
selector:
matchLabels:
app.kubernetes.io/component: kthena-router
endpoints:
- port: http
path: /metrics
interval: {{ .Values.kthenaRouter.metrics.serviceMonitor.interval }}
{{- end }}
30 changes: 30 additions & 0 deletions charts/kthena/charts/networking/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,36 @@ kthenaRouter:
# kubeAPIBurst is the burst to use while talking with kubernetes apiserver
# If 0 or not specified, uses default value (10)
kubeAPIBurst: 0
# metrics configuration for Prometheus monitoring
metrics:
serviceMonitor:
# enabled creates a ServiceMonitor for kthena-router (requires Prometheus Operator)
enabled: false
# interval is the scrape interval for the ServiceMonitor
interval: 15s
podMonitor:
# enabled creates a PodMonitor for inference pods (requires Prometheus Operator)
enabled: false
# interval is the scrape interval for the PodMonitor
interval: 15s
Comment on lines +76 to +80
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PodMonitor template has hardcoded ports for metrics scraping. To make it more flexible for different backends, consider making the metric endpoints configurable here. This change would need to be paired with an update to the podmonitor.yaml template to iterate over this list.

    podMonitor:
      # enabled creates a PodMonitor for inference pods (requires Prometheus Operator)
      enabled: false
      # interval is the scrape interval for the PodMonitor
      interval: 15s
      # endpoints for the PodMonitor to scrape
      endpoints:
        - { targetPort: 8000, path: /metrics }
        - { targetPort: 30000, path: /metrics }

# autoscaling configuration for KEDA-based inference scaling
autoscaling:
# enabled creates a KEDA ScaledObject (requires KEDA)
enabled: false
# scaleTargetName is the name of the Deployment to scale
scaleTargetName: ""
Comment on lines +85 to +86
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The scaleTargetName is a critical value when autoscaling is enabled, but it defaults to an empty string. This will cause the ScaledObject to be invalid. It's important to make it clear to the user that this value must be set. Adding a note to the comment and using a required check in the template would improve usability.

    # scaleTargetName is the name of the Deployment to scale (required if autoscaling.enabled is true)
    scaleTargetName: ""

# prometheusAddress is the Prometheus server URL
prometheusAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090
# query is the PromQL query used for scaling decisions
query: sum(kthena_router_active_downstream_requests)
# threshold is the per-pod metric value that triggers scaling
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment describes threshold as a per-pod metric value, but the default query is sum(kthena_router_active_downstream_requests) (a total across all matched series). Update the comment to match the query semantics or adjust the default query so the threshold is actually per pod/target.

Suggested change
# threshold is the per-pod metric value that triggers scaling
# threshold is the total metric value across all pods that triggers scaling

Copilot uses AI. Check for mistakes.
threshold: 5
# cooldownPeriod is seconds to wait before scaling down after last trigger
cooldownPeriod: 30
# minReplicas is the minimum number of inference replicas
minReplicas: 1
# maxReplicas is the maximum number of inference replicas
maxReplicas: 10

webhook:
enabled: true
Expand Down
28 changes: 28 additions & 0 deletions charts/kthena/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,34 @@ networking:
# -- Enable Gateway API Inference Extension features.<br/>
# Requires `gatewayAPI.enabled` to be true.
inferenceExtension: false
metrics:
serviceMonitor:
# -- Enable ServiceMonitor for kthena-router (requires Prometheus Operator).
enabled: false
# -- Scrape interval for the ServiceMonitor.
interval: 15s
podMonitor:
# -- Enable PodMonitor for inference pods (requires Prometheus Operator).
enabled: false
# -- Scrape interval for the PodMonitor.
interval: 15s
Comment on lines +90 to +94
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The PodMonitor template has hardcoded ports for metrics scraping. To make it more flexible for different backends, consider making the metric endpoints configurable here. This would involve adding an endpoints list that can be passed to the subchart and used in the podmonitor.yaml template.

      podMonitor:
        # -- Enable PodMonitor for inference pods (requires Prometheus Operator).
        enabled: false
        # -- Scrape interval for the PodMonitor.
        interval: 15s
        # -- Endpoints for the PodMonitor to scrape.
        endpoints:
          - { targetPort: 8000, path: /metrics }
          - { targetPort: 30000, path: /metrics }

autoscaling:
# -- Enable KEDA ScaledObject for inference autoscaling (requires KEDA).
enabled: false
# -- Name of the Deployment to scale.
scaleTargetName: ""
Comment on lines +98 to +99
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The scaleTargetName is a critical value when autoscaling is enabled, but it defaults to an empty string. This will cause the ScaledObject to be invalid. It's important to make it clear to the user that this value must be set.

      # -- Name of the Deployment to scale (required if autoscaling.enabled is true).
      scaleTargetName: ""

# -- Prometheus server URL for KEDA to query.
prometheusAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090
# -- PromQL query for scaling decisions.
query: sum(kthena_router_active_downstream_requests)
# -- Per-pod metric threshold that triggers scaling.
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the threshold is a per-pod metric threshold, but the default query is sum(kthena_router_active_downstream_requests), which produces a cluster-wide total across all series. Either adjust the wording to reflect that the threshold applies to the query result, or change the default query to return a per-pod/per-target value if that’s the intended semantics.

Suggested change
# -- Per-pod metric threshold that triggers scaling.
# -- Metric threshold on the result of the PromQL query that triggers scaling.

Copilot uses AI. Check for mistakes.
threshold: 5
# -- Seconds to wait before scaling down.
cooldownPeriod: 30
# -- Minimum inference replicas.
minReplicas: 1
# -- Maximum inference replicas.
maxReplicas: 10

global:
# -- Certificate Management Mode.<br/>
Expand Down
61 changes: 61 additions & 0 deletions monitoring.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kthena-router
labels:
app.kubernetes.io/component: kthena-router
Comment on lines +3 to +6
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Kubernetes resources in this manifest are not namespaced. It's a best practice to always define a namespace in the metadata for all resources to avoid deploying them into the default namespace accidentally and to ensure proper isolation. Please add a namespace to this ServiceMonitor and other resources in this file.

metadata:
  name: kthena-router
  namespace: default # Or another appropriate namespace
  labels:
    app.kubernetes.io/component: kthena-router

spec:
selector:
matchLabels:
app.kubernetes.io/component: kthena-router
endpoints:
- port: http
path: /metrics
interval: 15s
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kthena-inference
labels:
app.kubernetes.io/part-of: kthena
spec:
selector:
matchExpressions:
- key: modelserving.volcano.sh/name
operator: Exists
- key: modelserving.volcano.sh/group-name
operator: Exists
- key: modelserving.volcano.sh/role
operator: Exists
- key: modelserving.volcano.sh/entry
operator: In
values:
- "true"
podMetricsEndpoints:
- port: ""
targetPort: 8000
path: /metrics
interval: 15s
- port: ""
targetPort: 30000
Comment on lines +36 to +41
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PodMonitor example, podMetricsEndpoints[].port is set to an empty string. Prometheus Operator will treat this as a named port ("") and it can fail validation/scraping; if you intend to scrape by number, omit port entirely and keep only targetPort (or set a real named port).

Suggested change
- port: ""
targetPort: 8000
path: /metrics
interval: 15s
- port: ""
targetPort: 30000
- targetPort: 8000
path: /metrics
interval: 15s
- targetPort: 30000

Copilot uses AI. Check for mistakes.
path: /metrics
interval: 15s
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kthena-inference-scaler
labels:
app.kubernetes.io/part-of: kthena
spec:
minReplicaCount: 1
maxReplicaCount: 10
scaleTargetRef:
name: kthena-inference
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The Prometheus server address is missing the .svc suffix. For cross-namespace service resolution, the FQDN of the service should be used, which is typically in the format <service-name>.<namespace>.svc.cluster.local. While this might work in some cluster configurations depending on DNS search paths, it's more robust to use the full or partial FQDN.

        serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090

Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serverAddress is set to http://prometheus-kube-prometheus-prometheus.monitoring:9090. In Kubernetes DNS, service.namespace without .svc is not reliably resolvable across namespaces due to search path behavior; use the full service DNS name (e.g., ...monitoring.svc:9090 or ...monitoring.svc.cluster.local:9090) to avoid runtime resolution failures for the KEDA scaler.

Suggested change
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc:9090

Copilot uses AI. Check for mistakes.
query: sum(kthena_router_active_downstream_requests)
threshold: "5"
Comment on lines +1 to +61
Copy link

Copilot AI Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This top-level monitoring.yaml duplicates the Helm-rendered ServiceMonitor/PodMonitor/ScaledObject resources added under charts/. That duplication is likely to drift over time (e.g., different Prometheus address, different PodMonitor endpoints), so either move this to an examples/ or docs/ location and clearly mark it as an example, or remove it in favor of the chart templates.

Suggested change
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kthena-router
labels:
app.kubernetes.io/component: kthena-router
spec:
selector:
matchLabels:
app.kubernetes.io/component: kthena-router
endpoints:
- port: http
path: /metrics
interval: 15s
---
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: kthena-inference
labels:
app.kubernetes.io/part-of: kthena
spec:
selector:
matchExpressions:
- key: modelserving.volcano.sh/name
operator: Exists
- key: modelserving.volcano.sh/group-name
operator: Exists
- key: modelserving.volcano.sh/role
operator: Exists
- key: modelserving.volcano.sh/entry
operator: In
values:
- "true"
podMetricsEndpoints:
- port: ""
targetPort: 8000
path: /metrics
interval: 15s
- port: ""
targetPort: 30000
path: /metrics
interval: 15s
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: kthena-inference-scaler
labels:
app.kubernetes.io/part-of: kthena
spec:
minReplicaCount: 1
maxReplicaCount: 10
scaleTargetRef:
name: kthena-inference
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
query: sum(kthena_router_active_downstream_requests)
threshold: "5"
# Example monitoring configuration for kthena components.
#
# NOTE:
# - The authoritative ServiceMonitor, PodMonitor, and ScaledObject resources
# are rendered from the Helm charts under the `charts/` directory.
# - This file is kept only as documentation / an example manifest and should
# not be applied directly in environments that use the Helm chart.
#
# If you need to customize monitoring, prefer editing the Helm values/templates
# rather than modifying this example file.
# apiVersion: monitoring.coreos.com/v1
# kind: ServiceMonitor
# metadata:
# name: kthena-router
# labels:
# app.kubernetes.io/component: kthena-router
# spec:
# selector:
# matchLabels:
# app.kubernetes.io/component: kthena-router
# endpoints:
# - port: http
# path: /metrics
# interval: 15s
# ---
# apiVersion: monitoring.coreos.com/v1
# kind: PodMonitor
# metadata:
# name: kthena-inference
# labels:
# app.kubernetes.io/part-of: kthena
# spec:
# selector:
# matchExpressions:
# - key: modelserving.volcano.sh/name
# operator: Exists
# - key: modelserving.volcano.sh/group-name
# operator: Exists
# - key: modelserving.volcano.sh/role
# operator: Exists
# - key: modelserving.volcano.sh/entry
# operator: In
# values:
# - "true"
# podMetricsEndpoints:
# - port: ""
# targetPort: 8000
# path: /metrics
# interval: 15s
# - port: ""
# targetPort: 30000
# path: /metrics
# interval: 15s
# ---
# apiVersion: keda.sh/v1alpha1
# kind: ScaledObject
# metadata:
# name: kthena-inference-scaler
# labels:
# app.kubernetes.io/part-of: kthena
# spec:
# minReplicaCount: 1
# maxReplicaCount: 10
# scaleTargetRef:
# name: kthena-inference
# triggers:
# - type: prometheus
# metadata:
# serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring:9090
# query: sum(kthena_router_active_downstream_requests)
# threshold: "5"

Copilot uses AI. Check for mistakes.
Loading