Add KEDA autoscaling support for ModelServing via Prometheus by WHOIM1205 · Pull Request #839 · volcano-sh/kthena

WHOIM1205 · 2026-03-25T20:35:39Z

Summary

This PR fixes HPA compatibility for the ModelServing custom resource.

While trying to use KEDA + HPA for autoscaling, I found that scaling wasn’t working due to a missing label selector in the controller. This change adds that missing piece and makes autoscaling work end-to-end.

Problem

Autoscaling via KEDA → HPA was failing with:

selector is required

After digging into it, the issue was:

The CRD defines a scale subresource with:
- labelSelectorPath: .status.labelSelector
- specReplicasPath: .spec.replicas
- statusReplicasPath: .status.replicas
But .status.labelSelector was never being set by the controller

Because of that, HPA couldn’t figure out which pods belong to a ModelServing instance.

Fix

Set .status.labelSelector in the controller using the existing label:

This label is already applied to all pods created by ModelServing, so HPA can use it directly.

Handled in:

the early return case (no ServingGroups yet)
the main status update logic

The update is idempotent (only sets the value if needed).

Result

After this change:

HPA can correctly discover and track pods
Scaling works through spec.replicas (no Deployment involved)
Increasing replicas creates new ServingGroups and pods as expected

Testing

Tested locally:

Prometheus metrics are available (sum(rate(process_cpu_seconds_total[1m])) > 0)
KEDA ScaledObject becomes active
HPA is created and shows ScalingActive=True
ModelServing.spec.replicas increases under load
Pods scale from 1 → N as expected

So the full flow works:

Prometheus → KEDA → HPA → ModelServing → pods

Notes

Minimal change, no new labels added
Uses existing labeling already present in the controller
Works with the current CRD scale subresource as-is
Existing tests pass without any changes

Impact

Fixes HPA support for ModelServing
Makes it possible to use KEDA (or any external metrics) for autoscaling

refs : #799

- Populate .status.labelSelector in controller to enable HPA pod discovery - Add ScaledObject targeting ModelServing custom resource - Add RBAC for KEDA to scale ModelServing resources - Add ServiceMonitor and test deployment manifests Signed-off-by: WHOIM1205 <[email protected]>

volcano-sh-bot · 2026-03-25T20:35:45Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gemini-code-assist · 2026-03-25T20:35:56Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a critical fix that enables robust autoscaling for ModelServing custom resources within the Kubernetes ecosystem. By correctly populating the .status.labelSelector field in the ModelServing controller, it resolves a long-standing issue preventing Horizontal Pod Autoscalers (HPAs) from effectively managing ModelServing instances. This enhancement paves the way for seamless integration with KEDA and other external metrics providers, significantly improving the operational efficiency and responsiveness of model serving deployments.

Highlights

HPA Compatibility: Fixed Horizontal Pod Autoscaler (HPA) compatibility for the ModelServing custom resource by addressing a missing label selector.
Autoscaling Enablement: Enabled end-to-end autoscaling for ModelServing via KEDA and HPA, using Prometheus metrics as a trigger.
Controller Update: Modified the ModelServing controller to set the .status.labelSelector field, which is crucial for HPA to identify and track pods belonging to a ModelServing instance.
Minimal Impact: Implemented the fix with minimal changes, utilizing existing labels and working seamlessly with the current CRD scale subresource.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

WHOIM1205 · 2026-03-25T20:38:52Z

hey @LiZhenCheng9527 @hzxuzhonghu
This fixes HPA compatibility for ModelServing by setting status.labelSelector enabling end to end autoscaling (Prometheus → KEDA → HPA → ModelServing).
tested locally and verified scaling behavior would appreciate your review when you get a chance thanks

gemini-code-assist

Code Review

This pull request integrates KEDA for autoscaling ModelServing resources by adding necessary RBAC configurations and updating the ModelServingController to correctly set the LabelSelector in the ModelServing status, enabling HPA and KEDA to identify and scale associated pods. It also includes example YAMLs for a ModelServing instance, a KEDA ScaledObject, a ServiceMonitor, and test deployments with mock metrics. Feedback suggests that the example ScaledObject's Prometheus query should be updated to use metrics actually exposed by the mock deployments (e.g., vllm_num_requests_running), and a Service and ServiceMonitor are missing for the dummy-inference-vllm deployment to allow Prometheus to scrape its metrics, which are crucial for a fully functional autoscaling example.

gemini-code-assist · 2026-03-25T20:39:15Z

scaledobject.yaml

+        query: sum(rate(process_cpu_seconds_total[1m]))
+        threshold: "0.01"


The Prometheus query uses the metric process_cpu_seconds_total, but the mock deployments in test-deployment.yaml do not expose this metric. They expose kthena_router_* and vllm_* metrics.

To make the example consistent and functional, the query should use one of the available metrics. For example, using vllm_num_requests_running from the dummy-inference-vllm deployment would be more appropriate, as those pods are labeled to be part of the ModelServing instance.

You could change the query to something like this, assuming the goal is to scale when there's at least one running request:

query: sum(vllm_num_requests_running{modelserving_volcano_sh_name="test-model"}) threshold: "1"

gemini-code-assist · 2026-03-25T20:39:15Z

test-deployment.yaml

+        location / {
+            return 200 'dummy-vllm ok\n';
+        }
+    }


The dummy-inference-vllm deployment exposes vllm_* metrics that would be useful for autoscaling, but there is no Service defined to expose its pods for scraping. Without a Service and a corresponding ServiceMonitor, Prometheus will not be able to collect these metrics.

To make this example fully functional, a Service for this deployment should be added. For example:

--- apiVersion: v1 kind: Service metadata: name: dummy-inference-vllm labels: modelserving.volcano.sh/name: test-model spec: selector: modelserving.volcano.sh/name: test-model ports: - name: http-metrics port: 8000 targetPort: 8000 protocol: TCP

A corresponding ServiceMonitor would also be needed to instruct Prometheus to scrape this new service.

Copilot

Pull request overview

This PR fixes HPA/KEDA autoscaling for the ModelServing CR by ensuring the controller populates .status.labelSelector, which is required by the CRD scale subresource (selectorpath=.status.labelSelector) so HPA can select the correct pods.

Changes:

Set ModelServing.status.labelSelector to a stable selector matching pods labeled with modelserving.volcano.sh/name=<modelserving-name> (including the “no ServingGroups yet” early-return path).
Add example manifests for a Prometheus → KEDA → HPA → ModelServing scaling flow (ServiceMonitor, ScaledObject, RBAC, sample ModelServing, and test deployments).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
pkg/model-serving-controller/controller/model_serving_controller.go	Populate `.status.labelSelector` during status updates to enable HPA scale subresource functionality.
test-deployment.yaml	Example Deployments/Services/ConfigMaps to expose dummy `/metrics` endpoints for local testing.
servicemonitor.yaml	Example ServiceMonitor to have Prometheus scrape the dummy metrics endpoint.
scaledobject.yaml	Example KEDA ScaledObject targeting a `ModelServing` resource using Prometheus metrics.
modelserving.yaml	Example `ModelServing` resource used as the KEDA/HPA scaling target.
keda-rbac.yaml	Example RBAC to allow KEDA to interact with `modelservings` scaling APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-25T20:39:27Z

keda-rbac.yaml

+      - modelservings/scale
+      - modelservings/status
+    verbs:
+      - get
+      - list
+      - watch
+      - update
+      - patch


This RBAC grants broad cluster-wide permissions (update/patch) on modelservings and modelservings/status. For KEDA/HPA scaling, the operator typically only needs get on the target and get/update (and possibly patch) on the modelservings/scale subresource. Consider tightening the rules (and using a namespaced Role/RoleBinding if feasible) to follow least-privilege.

Suggested change

- modelservings/scale

- modelservings/status

verbs:

- get

- list

- watch

- update

- patch

verbs:

- get

- apiGroups:

- workload.serving.volcano.sh

resources:

- modelservings/scale

verbs:

- get

- update

Copilot · 2026-03-25T20:39:28Z

modelserving.yaml

+apiVersion: workload.serving.volcano.sh/v1alpha1
+kind: ModelServing
+metadata:
+  name: test-model
+  namespace: default
+spec:
+  replicas: 1
+  template:
+    roles:
+      - name: entry
+        workerReplicas: 1
+        entryTemplate:
+          metadata: {}
+          spec:
+            containers:
+              - name: entry
+                image: nginx
+        workerTemplate:
+          metadata: {}
+          spec:
+            containers:
+              - name: worker
+                image: nginx
+
+      - name: worker
+        workerReplicas: 1
+        entryTemplate:
+          metadata: {}
+          spec:
+            containers:
+              - name: entry
+                image: nginx
+        workerTemplate:
+          metadata: {}
+          spec:
+            containers:
+              - name: worker
+                image: nginx


This ModelServing manifest looks like an example asset; the repo already keeps sample CR YAML under examples/model-serving/. Consider moving it there (or a dedicated autoscaling example folder) rather than adding it at repo root.

Suggested change

apiVersion: workload.serving.volcano.sh/v1alpha1

kind: ModelServing

metadata:

name: test-model

namespace: default

spec:

replicas: 1

template:

roles:

- name: entry

workerReplicas: 1

entryTemplate:

metadata: {}

spec:

containers:

- name: entry

image: nginx

workerTemplate:

metadata: {}

spec:

containers:

- name: worker

image: nginx

- name: worker

workerReplicas: 1

entryTemplate:

metadata: {}

spec:

containers:

- name: entry

image: nginx

workerTemplate:

metadata: {}

spec:

containers:

- name: worker

image: nginx

# This file previously contained an example ModelServing manifest.

# To follow the repository convention and avoid cluttering the repo root,

# the actual example CR YAML has been moved under the examples directory.

#

# Please use the manifest in:

# examples/model-serving/modelserving.yaml

# (or the appropriate subfolder, e.g. examples/model-serving/autoscaling/)

#

# This stub is intentionally left without a Kubernetes resource definition.

# It exists only to document the relocation and to satisfy static analysis.

Copilot · 2026-03-25T20:39:28Z

scaledobject.yaml

+apiVersion: keda.sh/v1alpha1
+kind: ScaledObject
+metadata:
+  name: modelserving-scaler
+  namespace: default
+spec:
+  scaleTargetRef:
+    apiVersion: workload.serving.volcano.sh/v1alpha1
+    kind: ModelServing
+    name: test-model
+  minReplicaCount: 1
+  maxReplicaCount: 5
+  pollingInterval: 15
+  cooldownPeriod: 60
+  triggers:
+    - type: prometheus
+      metadata:
+        serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
+        query: sum(rate(process_cpu_seconds_total[1m]))
+        threshold: "0.01"


This ScaledObject manifest appears to be an example asset; to align with the repo’s existing structure for sample YAML, consider moving it under examples/ (e.g., examples/autoscaling/keda/) rather than adding it at repository root.

Suggested change

apiVersion: keda.sh/v1alpha1

kind: ScaledObject

metadata:

name: modelserving-scaler

namespace: default

spec:

scaleTargetRef:

apiVersion: workload.serving.volcano.sh/v1alpha1

kind: ModelServing

name: test-model

minReplicaCount: 1

maxReplicaCount: 5

pollingInterval: 15

cooldownPeriod: 60

triggers:

- type: prometheus

metadata:

serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090

query: sum(rate(process_cpu_seconds_total[1m]))

threshold: "0.01"

# Placeholder file at repository root.

# The example KEDA ScaledObject manifest has been moved under the examples tree,

# for example: examples/autoscaling/keda/scaledobject.yaml

#

# This file is intentionally left without any Kubernetes resources to avoid

# having example manifests at the repository root.

Copilot · 2026-03-25T20:39:28Z

keda-rbac.yaml

+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: keda-modelserving-scaling
+rules:


This RBAC manifest also appears to be an example asset; consider relocating it under examples//docs/ so users can find it alongside other sample manifests instead of at the repository root.

Copilot · 2026-03-25T20:39:29Z

pkg/model-serving-controller/controller/model_serving_controller.go

+		// Set labelSelector so the scale subresource can report it to HPA.
+		// Without this, HPA fails with "selector is required" because it cannot
+		// determine which pods belong to this ModelServing.
+		// The selector matches the label applied to all pods by createBasePod().
+		selector := labels.Set{
+			workloadv1alpha1.ModelServingNameLabelKey: latestMS.Name,
+		}.String()
+		if copy.Status.LabelSelector != selector {
+			shouldUpdate = true
+			copy.Status.LabelSelector = selector
+		}


Add/adjust a unit test to assert that UpdateModelServingStatus sets .status.labelSelector (both in the no-serving-groups early return and the normal status update path). This behavior is critical for the CRD scale subresource/HPA integration and is currently unverified by tests.

Copilot · 2026-03-25T20:39:29Z

test-deployment.yaml

+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: kthena-router
+  labels:


These manifests appear to be example/test assets, but they are being added at repo root even though the repository already has an examples/ tree for custom resource YAML. Consider moving this file under an appropriate examples/ subdirectory (e.g., examples/model-serving/ or a new examples/autoscaling/keda/) to keep the top-level clean and make discovery consistent.

Copilot · 2026-03-25T20:39:29Z

servicemonitor.yaml

+apiVersion: monitoring.coreos.com/v1
+kind: ServiceMonitor
+metadata:
+  name: kthena-router
+  namespace: monitoring
+spec:


This ServiceMonitor looks like an example asset; the repo already centralizes sample manifests under examples/. Consider relocating it under examples/ (and/or docs/) instead of adding it to the repository root.

Copilot · 2026-03-25T20:39:29Z

scaledobject.yaml

+    - type: prometheus
+      metadata:
+        serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
+        query: sum(rate(process_cpu_seconds_total[1m]))


The Prometheus query sum(rate(process_cpu_seconds_total[1m])) is effectively cluster-wide and not scoped to the ModelServing/pods being autoscaled, so it can cause unintended scaling driven by unrelated workloads. Consider scoping the query to the target workload (e.g., via pod, namespace, or a label like modelserving.volcano.sh/name=test-model) or using a workload-specific metric.

Suggested change

query: sum(rate(process_cpu_seconds_total[1m]))

query: sum(rate(process_cpu_seconds_total{namespace="default", pod=~"test-model-.*"}[1m]))

LiZhenCheng9527

You can place these examples in the ‘example’ folder

LiZhenCheng9527 · 2026-03-28T01:59:07Z

keda-rbac.yaml

+      - modelservings
+      - modelservings/scale
+      - modelservings/status
+    verbs:


Do modelserving really needs all these permissions?

LiZhenCheng9527 · 2026-03-28T02:02:17Z

modelserving.yaml

@@ -0,0 +1,38 @@
+apiVersion: workload.serving.volcano.sh/v1alpha1


Why is such an example needed? The one in my directory is identical to examples/model-serving/sample.yaml.

- Remove modelservings/status from KEDA ClusterRole (not needed) - Restrict modelservings base resource to read-only verbs - Delete redundant modelserving.yaml test file Signed-off-by: WHOIM1205 <[email protected]>

WHOIM1205 · 2026-03-29T10:37:09Z

hey @LiZhenCheng9527
Updated as per review:

Trimmed RBAC permissions to only required access for KEDA (modelservings + modelservings/scale)
Removed redundant example from the repo root

Please let me know if anything else should be adjusted.

hzxuzhonghu · 2026-03-30T02:57:08Z

I did a first pass on this diff. Setting status.labelSelector from the existing modelserving.volcano.sh/name label looks like the right fix for HPA/KEDA compatibility with the scale subresource. I did not spot a blocking issue in the change set.

Main thing I would still want to confirm is normal CI plus at least one regression test around the status update path for both cases:

no ServingGroups yet
existing ServingGroups already present

Signed-off-by: WHOIM1205 <[email protected]>

WHOIM1205 · 2026-03-31T19:18:27Z

hey @hzxuzhonghu added regression tests for the status update path as suggested:

Covered the case with no ServingGroups (early return path)
Covered the case with existing ServingGroups
Also added a case for names with special characters to ensure selector formatting remains correct

All tests are passing. Please let me know if you'd like any additional coverage.

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T19:25:00Z

test-deployment.yaml

+        location /metrics {
+            default_type text/plain;
+            return 200 '# HELP kthena_router_active_downstream_requests Number of active downstream requests\n# TYPE kthena_router_active_downstream_requests gauge\nkthena_router_active_downstream_requests 3\n# HELP kthena_router_requests_total Total requests\n# TYPE kthena_router_requests_total counter\nkthena_router_requests_total 100\n';
+        }


The NGINX return 200 '...\\n...' payload will emit literal backslash-n sequences (NGINX doesn’t interpret \\n escapes here), which can make the Prometheus exposition invalid/unparseable. To make these manifests reliably scrapeable, serve metrics with real newlines (e.g., by returning a multi-line literal string with actual newline characters, serving a static metrics file, or using a tiny HTTP server/exporter image that emits valid Prometheus text format).

Copilot · 2026-03-31T19:25:01Z

test-deployment.yaml

+        location /metrics {
+            default_type text/plain;
+            return 200 '# HELP vllm_num_requests_running Number of running requests\n# TYPE vllm_num_requests_running gauge\nvllm_num_requests_running 2\n# HELP vllm_num_requests_waiting Number of waiting requests\n# TYPE vllm_num_requests_waiting gauge\nvllm_num_requests_waiting 0\n# HELP vllm_gpu_cache_usage_perc GPU cache usage percentage\n# TYPE vllm_gpu_cache_usage_perc gauge\nvllm_gpu_cache_usage_perc 0.45\n';
+        }


The NGINX return 200 '...\\n...' payload will emit literal backslash-n sequences (NGINX doesn’t interpret \\n escapes here), which can make the Prometheus exposition invalid/unparseable. To make these manifests reliably scrapeable, serve metrics with real newlines (e.g., by returning a multi-line literal string with actual newline characters, serving a static metrics file, or using a tiny HTTP server/exporter image that emits valid Prometheus text format).

Copilot · 2026-03-31T19:25:01Z

test-deployment.yaml

+    modelserving.volcano.sh/name: test-model
+    modelserving.volcano.sh/entry: "true"
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      modelserving.volcano.sh/name: test-model
+      modelserving.volcano.sh/entry: "true"
+  template:
+    metadata:
+      labels:
+        modelserving.volcano.sh/name: test-model
+        modelserving.volcano.sh/entry: "true"


This example Deployment is labeled with the same modelserving.volcano.sh/name: test-model key/value that the controller now publishes via status.labelSelector. If someone applies this alongside a real ModelServing named test-model, HPA will likely count these pods as part of the scale target, skewing replica calculations and metrics. Recommend updating the example to avoid reusing the ModelServingNameLabelKey label (or use a different msName value that cannot collide with an actual ModelServing), so it doesn’t interfere with autoscaling behavior.

Suggested change

modelserving.volcano.sh/name: test-model

modelserving.volcano.sh/entry: "true"

spec:

replicas: 1

selector:

matchLabels:

modelserving.volcano.sh/name: test-model

modelserving.volcano.sh/entry: "true"

template:

metadata:

labels:

modelserving.volcano.sh/name: test-model

modelserving.volcano.sh/entry: "true"

app.kubernetes.io/name: dummy-inference-vllm

app.kubernetes.io/entry: "true"

spec:

replicas: 1

selector:

matchLabels:

app.kubernetes.io/name: dummy-inference-vllm

app.kubernetes.io/entry: "true"

template:

metadata:

labels:

app.kubernetes.io/name: dummy-inference-vllm

app.kubernetes.io/entry: "true"

Copilot · 2026-03-31T19:25:01Z

test-deployment.yaml

+    modelserving.volcano.sh/name: test-model
+    modelserving.volcano.sh/entry: "true"
+spec:
+  replicas: 1
+  selector:
+    matchLabels:
+      modelserving.volcano.sh/name: test-model
+      modelserving.volcano.sh/entry: "true"
+  template:
+    metadata:
+      labels:
+        modelserving.volcano.sh/name: test-model


This example Deployment is labeled with the same modelserving.volcano.sh/name: test-model key/value that the controller now publishes via status.labelSelector. If someone applies this alongside a real ModelServing named test-model, HPA will likely count these pods as part of the scale target, skewing replica calculations and metrics. Recommend updating the example to avoid reusing the ModelServingNameLabelKey label (or use a different msName value that cannot collide with an actual ModelServing), so it doesn’t interfere with autoscaling behavior.

Suggested change

modelserving.volcano.sh/name: test-model

modelserving.volcano.sh/entry: "true"

spec:

replicas: 1

selector:

matchLabels:

modelserving.volcano.sh/name: test-model

modelserving.volcano.sh/entry: "true"

template:

metadata:

labels:

modelserving.volcano.sh/name: test-model

modelserving.volcano.sh/name: dummy-test-model

modelserving.volcano.sh/entry: "true"

spec:

replicas: 1

selector:

matchLabels:

modelserving.volcano.sh/name: dummy-test-model

modelserving.volcano.sh/entry: "true"

template:

metadata:

labels:

modelserving.volcano.sh/name: dummy-test-model

Copilot · 2026-03-31T19:25:02Z

keda-rbac.yaml

+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: keda-modelserving-scaling
+rules:
+  - apiGroups:
+      - workload.serving.volcano.sh
+    resources:
+      - modelservings
+    verbs:
+      - get
+      - list
+      - watch
+  - apiGroups:
+      - workload.serving.volcano.sh
+    resources:
+      - modelservings/scale
+    verbs:
+      - get
+      - update
+      - patch
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: keda-modelserving-scaling


This grants cluster-wide permissions to read all ModelServing objects and update any modelservings/scale. If this is intended as an example for scaling a single namespace (the provided ScaledObject is in default), prefer a namespaced Role/RoleBinding scoped to that namespace to reduce blast radius. If cluster-wide access is genuinely required, consider documenting that explicitly in-file to prevent accidental over-privileging.

Copilot · 2026-03-31T19:25:02Z

pkg/model-serving-controller/controller/model_serving_controller_test.go

+			revision: "rev-1",
+		},
+		{
+			name:   "name with special characters — selector encodes correctly",


The test case description says the selector 'encodes correctly', but the implementation uses labels.Set{...}.String() and does not perform any encoding; it just formats a k=v selector string. Suggest updating the test name to reflect what’s actually being validated (e.g., that a typical DNS-1123-ish name round-trips into the selector string) to avoid misleading future readers.

Suggested change

name: "name with special characters — selector encodes correctly",

name: "name with dashes and numbers — selector string contains name unmodified",

hzxuzhonghu · 2026-04-01T01:35:42Z

test-deployment.yaml

+    spec:
+      containers:
+        - name: kthena-router
+          image: nginx:alpine


not sure i understand why deploy nginx ad kthena router

hzxuzhonghu · 2026-04-01T01:36:55Z

examples/keda-autoscaling/scaledobject.yaml

+  scaleTargetRef:
+    apiVersion: workload.serving.volcano.sh/v1alpha1
+    kind: ModelServing
+    name: test-model


where is test-model?

hzxuzhonghu · 2026-04-01T01:38:30Z

examples/keda-autoscaling/scaledobject.yaml

+    kind: ModelServing
+    name: test-model
+  minReplicaCount: 1
+  maxReplicaCount: 5


since you have label selector on all the pods belong the modelserving, but actually we want to scale based on groups. The pod number usually is at least 2X number of serving groups. How do we handle that.

hzxuzhonghu · 2026-04-01T01:39:53Z

You can place these examples in the ‘example’ folder

I agree with this, can we add a addon dir like istio https://github.com/istio/istio/tree/master/samples/addons

…dback - Remove loose test-deployment.yaml and scaledobject.yaml from repo root - Add proper ModelServing example (test-model) with entry + worker roles - Add ScaledObject with real Prometheus query (kthena_router_active_downstream_requests) - Add README explaining usage and groups-vs-pods scaling design Signed-off-by: WHOIM1205 <[email protected]>

WHOIM1205 · 2026-04-02T11:46:29Z

hey @hzxuzhonghu thanks for the feedback

removed the nginx based test deployment and now relying on the actual kthena router image
added a proper example under examples/keda-autoscaling/ including ModelServing, ScaledObject, and README
apdated the ScaledObject to reference a real ModelServing (test-model)
switched to a real router metric (kthena_router_active_downstream_requests) for scaling

regarding the scaling behavior:
ModelServing scales via spec.replicas (serving groups) each group may contain multiple pods but the Prometheus query uses sum() aggregation to produce a single value the threshold is defined at the group level so scaling decisions correctly map to groups rather than individual pods

Please let me know if any further adjustments are needed.

Copilot AI review requested due to automatic review settings March 25, 2026 20:35

volcano-sh-bot requested review from LiZhenCheng9527 and hzxuzhonghu March 25, 2026 20:35

volcano-sh-bot added the size/L label Mar 25, 2026

Copilot started reviewing on behalf of WHOIM1205 March 25, 2026 20:36 View session

gemini-code-assist bot reviewed Mar 25, 2026

View reviewed changes

Copilot AI reviewed Mar 25, 2026

View reviewed changes

LiZhenCheng9527 reviewed Mar 28, 2026

View reviewed changes

Trim KEDA RBAC and remove redundant example

75d78ee

- Remove modelservings/status from KEDA ClusterRole (not needed) - Restrict modelservings base resource to read-only verbs - Delete redundant modelserving.yaml test file Signed-off-by: WHOIM1205 <[email protected]>

Add regression tests for labelSelector in UpdateModelServingStatus

fc2b651

Signed-off-by: WHOIM1205 <[email protected]>

Copilot AI review requested due to automatic review settings March 31, 2026 19:17

Copilot started reviewing on behalf of WHOIM1205 March 31, 2026 19:24 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

hzxuzhonghu reviewed Apr 1, 2026

View reviewed changes

WHOIM1205 mentioned this pull request Apr 2, 2026

Understanding current Kthena autoscaler for HPA/KEDA integration #799

Open

WHOIM1205 mentioned this pull request Apr 4, 2026

Proposal: Prometheus + KEDA autoscaling for ModelServing #868

Open

		query: sum(rate(process_cpu_seconds_total[1m]))
		threshold: "0.01"

		@@ -0,0 +1,38 @@
		apiVersion: workload.serving.volcano.sh/v1alpha1

	name: "name with special characters — selector encodes correctly",
	name: "name with dashes and numbers — selector string contains name unmodified",

Conversation

WHOIM1205 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Result

Testing

Notes

Impact

Uh oh!

volcano-sh-bot commented Mar 25, 2026

Uh oh!

gemini-code-assist bot commented Mar 25, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

WHOIM1205 commented Mar 25, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

LiZhenCheng9527 left a comment

Choose a reason for hiding this comment

Uh oh!

LiZhenCheng9527 Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

LiZhenCheng9527 Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

WHOIM1205 commented Mar 29, 2026

Uh oh!

hzxuzhonghu commented Mar 30, 2026

Uh oh!

WHOIM1205 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 31, 2026

WHOIM1205 commented Mar 25, 2026 •

edited

Loading

WHOIM1205 commented Mar 31, 2026 •

edited

Loading

WHOIM1205 commented Apr 2, 2026 •

edited

Loading