Skip to content

Add KEDA autoscaling support for ModelServing via Prometheus#839

Open
WHOIM1205 wants to merge 4 commits intovolcano-sh:mainfrom
WHOIM1205:feat/keda-modelserving-autoscaling
Open

Add KEDA autoscaling support for ModelServing via Prometheus#839
WHOIM1205 wants to merge 4 commits intovolcano-sh:mainfrom
WHOIM1205:feat/keda-modelserving-autoscaling

Conversation

@WHOIM1205
Copy link
Copy Markdown
Contributor

@WHOIM1205 WHOIM1205 commented Mar 25, 2026

Summary

This PR fixes HPA compatibility for the ModelServing custom resource.

While trying to use KEDA + HPA for autoscaling, I found that scaling wasn’t working due to a missing label selector in the controller. This change adds that missing piece and makes autoscaling work end-to-end.


Problem

Autoscaling via KEDA → HPA was failing with:

selector is required

After digging into it, the issue was:

  • The CRD defines a scale subresource with:
    • labelSelectorPath: .status.labelSelector
    • specReplicasPath: .spec.replicas
    • statusReplicasPath: .status.replicas
  • But .status.labelSelector was never being set by the controller

Because of that, HPA couldn’t figure out which pods belong to a ModelServing instance.


Fix

Set .status.labelSelector in the controller using the existing label:

This label is already applied to all pods created by ModelServing, so HPA can use it directly.

Handled in:

  • the early return case (no ServingGroups yet)
  • the main status update logic

The update is idempotent (only sets the value if needed).


Result

After this change:

  • HPA can correctly discover and track pods
  • Scaling works through spec.replicas (no Deployment involved)
  • Increasing replicas creates new ServingGroups and pods as expected

Testing

Tested locally:

  • Prometheus metrics are available (sum(rate(process_cpu_seconds_total[1m])) > 0)
  • KEDA ScaledObject becomes active
  • HPA is created and shows ScalingActive=True
  • ModelServing.spec.replicas increases under load
  • Pods scale from 1 → N as expected
Screenshot 2026-03-26 015636 Screenshot 2026-03-26 015738 Screenshot 2026-03-26 015928 Screenshot 2026-03-26 015950

So the full flow works:

Prometheus → KEDA → HPA → ModelServing → pods


Notes

  • Minimal change, no new labels added
  • Uses existing labeling already present in the controller
  • Works with the current CRD scale subresource as-is
  • Existing tests pass without any changes

Impact

  • Fixes HPA support for ModelServing
  • Makes it possible to use KEDA (or any external metrics) for autoscaling

refs : #799

- Populate .status.labelSelector in controller to enable HPA pod discovery
- Add ScaledObject targeting ModelServing custom resource
- Add RBAC for KEDA to scale ModelServing resources
- Add ServiceMonitor and test deployment manifests

Signed-off-by: WHOIM1205 <[email protected]>
Copilot AI review requested due to automatic review settings March 25, 2026 20:35
@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign hzxuzhonghu for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a critical fix that enables robust autoscaling for ModelServing custom resources within the Kubernetes ecosystem. By correctly populating the .status.labelSelector field in the ModelServing controller, it resolves a long-standing issue preventing Horizontal Pod Autoscalers (HPAs) from effectively managing ModelServing instances. This enhancement paves the way for seamless integration with KEDA and other external metrics providers, significantly improving the operational efficiency and responsiveness of model serving deployments.

Highlights

  • HPA Compatibility: Fixed Horizontal Pod Autoscaler (HPA) compatibility for the ModelServing custom resource by addressing a missing label selector.
  • Autoscaling Enablement: Enabled end-to-end autoscaling for ModelServing via KEDA and HPA, using Prometheus metrics as a trigger.
  • Controller Update: Modified the ModelServing controller to set the .status.labelSelector field, which is crucial for HPA to identify and track pods belonging to a ModelServing instance.
  • Minimal Impact: Implemented the fix with minimal changes, utilizing existing labels and working seamlessly with the current CRD scale subresource.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@WHOIM1205
Copy link
Copy Markdown
Contributor Author

hey @LiZhenCheng9527 @hzxuzhonghu
This fixes HPA compatibility for ModelServing by setting status.labelSelector enabling end to end autoscaling (Prometheus → KEDA → HPA → ModelServing).
tested locally and verified scaling behavior would appreciate your review when you get a chance thanks

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request integrates KEDA for autoscaling ModelServing resources by adding necessary RBAC configurations and updating the ModelServingController to correctly set the LabelSelector in the ModelServing status, enabling HPA and KEDA to identify and scale associated pods. It also includes example YAMLs for a ModelServing instance, a KEDA ScaledObject, a ServiceMonitor, and test deployments with mock metrics. Feedback suggests that the example ScaledObject's Prometheus query should be updated to use metrics actually exposed by the mock deployments (e.g., vllm_num_requests_running), and a Service and ServiceMonitor are missing for the dummy-inference-vllm deployment to allow Prometheus to scrape its metrics, which are crucial for a fully functional autoscaling example.

Comment on lines +19 to +20
query: sum(rate(process_cpu_seconds_total[1m]))
threshold: "0.01"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Prometheus query uses the metric process_cpu_seconds_total, but the mock deployments in test-deployment.yaml do not expose this metric. They expose kthena_router_* and vllm_* metrics.

To make the example consistent and functional, the query should use one of the available metrics. For example, using vllm_num_requests_running from the dummy-inference-vllm deployment would be more appropriate, as those pods are labeled to be part of the ModelServing instance.

You could change the query to something like this, assuming the goal is to scale when there's at least one running request:

        query: sum(vllm_num_requests_running{modelserving_volcano_sh_name="test-model"})
        threshold: "1"

location / {
return 200 'dummy-vllm ok\n';
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The dummy-inference-vllm deployment exposes vllm_* metrics that would be useful for autoscaling, but there is no Service defined to expose its pods for scraping. Without a Service and a corresponding ServiceMonitor, Prometheus will not be able to collect these metrics.

To make this example fully functional, a Service for this deployment should be added. For example:

---
apiVersion: v1
kind: Service
metadata:
  name: dummy-inference-vllm
  labels:
    modelserving.volcano.sh/name: test-model
spec:
  selector:
    modelserving.volcano.sh/name: test-model
  ports:
    - name: http-metrics
      port: 8000
      targetPort: 8000
      protocol: TCP

A corresponding ServiceMonitor would also be needed to instruct Prometheus to scrape this new service.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes HPA/KEDA autoscaling for the ModelServing CR by ensuring the controller populates .status.labelSelector, which is required by the CRD scale subresource (selectorpath=.status.labelSelector) so HPA can select the correct pods.

Changes:

  • Set ModelServing.status.labelSelector to a stable selector matching pods labeled with modelserving.volcano.sh/name=<modelserving-name> (including the “no ServingGroups yet” early-return path).
  • Add example manifests for a Prometheus → KEDA → HPA → ModelServing scaling flow (ServiceMonitor, ScaledObject, RBAC, sample ModelServing, and test deployments).

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
pkg/model-serving-controller/controller/model_serving_controller.go Populate .status.labelSelector during status updates to enable HPA scale subresource functionality.
test-deployment.yaml Example Deployments/Services/ConfigMaps to expose dummy /metrics endpoints for local testing.
servicemonitor.yaml Example ServiceMonitor to have Prometheus scrape the dummy metrics endpoint.
scaledobject.yaml Example KEDA ScaledObject targeting a ModelServing resource using Prometheus metrics.
modelserving.yaml Example ModelServing resource used as the KEDA/HPA scaling target.
keda-rbac.yaml Example RBAC to allow KEDA to interact with modelservings scaling APIs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

keda-rbac.yaml Outdated
Comment on lines +10 to +17
- modelservings/scale
- modelservings/status
verbs:
- get
- list
- watch
- update
- patch
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RBAC grants broad cluster-wide permissions (update/patch) on modelservings and modelservings/status. For KEDA/HPA scaling, the operator typically only needs get on the target and get/update (and possibly patch) on the modelservings/scale subresource. Consider tightening the rules (and using a namespaced Role/RoleBinding if feasible) to follow least-privilege.

Suggested change
- modelservings/scale
- modelservings/status
verbs:
- get
- list
- watch
- update
- patch
verbs:
- get
- apiGroups:
- workload.serving.volcano.sh
resources:
- modelservings/scale
verbs:
- get
- update

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +38
apiVersion: workload.serving.volcano.sh/v1alpha1
kind: ModelServing
metadata:
name: test-model
namespace: default
spec:
replicas: 1
template:
roles:
- name: entry
workerReplicas: 1
entryTemplate:
metadata: {}
spec:
containers:
- name: entry
image: nginx
workerTemplate:
metadata: {}
spec:
containers:
- name: worker
image: nginx

- name: worker
workerReplicas: 1
entryTemplate:
metadata: {}
spec:
containers:
- name: entry
image: nginx
workerTemplate:
metadata: {}
spec:
containers:
- name: worker
image: nginx
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ModelServing manifest looks like an example asset; the repo already keeps sample CR YAML under examples/model-serving/. Consider moving it there (or a dedicated autoscaling example folder) rather than adding it at repo root.

Suggested change
apiVersion: workload.serving.volcano.sh/v1alpha1
kind: ModelServing
metadata:
name: test-model
namespace: default
spec:
replicas: 1
template:
roles:
- name: entry
workerReplicas: 1
entryTemplate:
metadata: {}
spec:
containers:
- name: entry
image: nginx
workerTemplate:
metadata: {}
spec:
containers:
- name: worker
image: nginx
- name: worker
workerReplicas: 1
entryTemplate:
metadata: {}
spec:
containers:
- name: entry
image: nginx
workerTemplate:
metadata: {}
spec:
containers:
- name: worker
image: nginx
# This file previously contained an example ModelServing manifest.
# To follow the repository convention and avoid cluttering the repo root,
# the actual example CR YAML has been moved under the examples directory.
#
# Please use the manifest in:
# examples/model-serving/modelserving.yaml
# (or the appropriate subfolder, e.g. examples/model-serving/autoscaling/)
#
# This stub is intentionally left without a Kubernetes resource definition.
# It exists only to document the relocation and to satisfy static analysis.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +20
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: modelserving-scaler
namespace: default
spec:
scaleTargetRef:
apiVersion: workload.serving.volcano.sh/v1alpha1
kind: ModelServing
name: test-model
minReplicaCount: 1
maxReplicaCount: 5
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
query: sum(rate(process_cpu_seconds_total[1m]))
threshold: "0.01"
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ScaledObject manifest appears to be an example asset; to align with the repo’s existing structure for sample YAML, consider moving it under examples/ (e.g., examples/autoscaling/keda/) rather than adding it at repository root.

Suggested change
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: modelserving-scaler
namespace: default
spec:
scaleTargetRef:
apiVersion: workload.serving.volcano.sh/v1alpha1
kind: ModelServing
name: test-model
minReplicaCount: 1
maxReplicaCount: 5
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
query: sum(rate(process_cpu_seconds_total[1m]))
threshold: "0.01"
# Placeholder file at repository root.
# The example KEDA ScaledObject manifest has been moved under the examples tree,
# for example: examples/autoscaling/keda/scaledobject.yaml
#
# This file is intentionally left without any Kubernetes resources to avoid
# having example manifests at the repository root.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: keda-modelserving-scaling
rules:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RBAC manifest also appears to be an example asset; consider relocating it under examples//docs/ so users can find it alongside other sample manifests instead of at the repository root.

Copilot uses AI. Check for mistakes.
Comment on lines +1753 to +1763
// Set labelSelector so the scale subresource can report it to HPA.
// Without this, HPA fails with "selector is required" because it cannot
// determine which pods belong to this ModelServing.
// The selector matches the label applied to all pods by createBasePod().
selector := labels.Set{
workloadv1alpha1.ModelServingNameLabelKey: latestMS.Name,
}.String()
if copy.Status.LabelSelector != selector {
shouldUpdate = true
copy.Status.LabelSelector = selector
}
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add/adjust a unit test to assert that UpdateModelServingStatus sets .status.labelSelector (both in the no-serving-groups early return and the normal status update path). This behavior is critical for the CRD scale subresource/HPA integration and is currently unverified by tests.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
apiVersion: apps/v1
kind: Deployment
metadata:
name: kthena-router
labels:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These manifests appear to be example/test assets, but they are being added at repo root even though the repository already has an examples/ tree for custom resource YAML. Consider moving this file under an appropriate examples/ subdirectory (e.g., examples/model-serving/ or a new examples/autoscaling/keda/) to keep the top-level clean and make discovery consistent.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +6
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kthena-router
namespace: monitoring
spec:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ServiceMonitor looks like an example asset; the repo already centralizes sample manifests under examples/. Consider relocating it under examples/ (and/or docs/) instead of adding it to the repository root.

Copilot uses AI. Check for mistakes.
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
query: sum(rate(process_cpu_seconds_total[1m]))
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Prometheus query sum(rate(process_cpu_seconds_total[1m])) is effectively cluster-wide and not scoped to the ModelServing/pods being autoscaled, so it can cause unintended scaling driven by unrelated workloads. Consider scoping the query to the target workload (e.g., via pod, namespace, or a label like modelserving.volcano.sh/name=test-model) or using a workload-specific metric.

Suggested change
query: sum(rate(process_cpu_seconds_total[1m]))
query: sum(rate(process_cpu_seconds_total{namespace="default", pod=~"test-model-.*"}[1m]))

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@LiZhenCheng9527 LiZhenCheng9527 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can place these examples in the ‘example’ folder

- modelservings
- modelservings/scale
- modelservings/status
verbs:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do modelserving really needs all these permissions?

@@ -0,0 +1,38 @@
apiVersion: workload.serving.volcano.sh/v1alpha1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is such an example needed? The one in my directory is identical to examples/model-serving/sample.yaml.

- Remove modelservings/status from KEDA ClusterRole (not needed)
- Restrict modelservings base resource to read-only verbs
- Delete redundant modelserving.yaml test file

Signed-off-by: WHOIM1205 <[email protected]>
@WHOIM1205
Copy link
Copy Markdown
Contributor Author

hey @LiZhenCheng9527
Updated as per review:

  • Trimmed RBAC permissions to only required access for KEDA (modelservings + modelservings/scale)
  • Removed redundant example from the repo root

Please let me know if anything else should be adjusted.

@hzxuzhonghu
Copy link
Copy Markdown
Member

I did a first pass on this diff. Setting status.labelSelector from the existing modelserving.volcano.sh/name label looks like the right fix for HPA/KEDA compatibility with the scale subresource. I did not spot a blocking issue in the change set.

Main thing I would still want to confirm is normal CI plus at least one regression test around the status update path for both cases:

  • no ServingGroups yet
  • existing ServingGroups already present

Copilot AI review requested due to automatic review settings March 31, 2026 19:17
@WHOIM1205
Copy link
Copy Markdown
Contributor Author

WHOIM1205 commented Mar 31, 2026

hey @hzxuzhonghu added regression tests for the status update path as suggested:

  1. Covered the case with no ServingGroups (early return path)
  2. Covered the case with existing ServingGroups
  3. Also added a case for names with special characters to ensure selector formatting remains correct

All tests are passing. Please let me know if you'd like any additional coverage.

Screenshot 2026-04-01 004453

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +39 to +42
location /metrics {
default_type text/plain;
return 200 '# HELP kthena_router_active_downstream_requests Number of active downstream requests\n# TYPE kthena_router_active_downstream_requests gauge\nkthena_router_active_downstream_requests 3\n# HELP kthena_router_requests_total Total requests\n# TYPE kthena_router_requests_total counter\nkthena_router_requests_total 100\n';
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NGINX return 200 '...\\n...' payload will emit literal backslash-n sequences (NGINX doesn’t interpret \\n escapes here), which can make the Prometheus exposition invalid/unparseable. To make these manifests reliably scrapeable, serve metrics with real newlines (e.g., by returning a multi-line literal string with actual newline characters, serving a static metrics file, or using a tiny HTTP server/exporter image that emits valid Prometheus text format).

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +108
location /metrics {
default_type text/plain;
return 200 '# HELP vllm_num_requests_running Number of running requests\n# TYPE vllm_num_requests_running gauge\nvllm_num_requests_running 2\n# HELP vllm_num_requests_waiting Number of waiting requests\n# TYPE vllm_num_requests_waiting gauge\nvllm_num_requests_waiting 0\n# HELP vllm_gpu_cache_usage_perc GPU cache usage percentage\n# TYPE vllm_gpu_cache_usage_perc gauge\nvllm_gpu_cache_usage_perc 0.45\n';
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NGINX return 200 '...\\n...' payload will emit literal backslash-n sequences (NGINX doesn’t interpret \\n escapes here), which can make the Prometheus exposition invalid/unparseable. To make these manifests reliably scrapeable, serve metrics with real newlines (e.g., by returning a multi-line literal string with actual newline characters, serving a static metrics file, or using a tiny HTTP server/exporter image that emits valid Prometheus text format).

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +81
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
template:
metadata:
labels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example Deployment is labeled with the same modelserving.volcano.sh/name: test-model key/value that the controller now publishes via status.labelSelector. If someone applies this alongside a real ModelServing named test-model, HPA will likely count these pods as part of the scale target, skewing replica calculations and metrics. Recommend updating the example to avoid reusing the ModelServingNameLabelKey label (or use a different msName value that cannot collide with an actual ModelServing), so it doesn’t interfere with autoscaling behavior.

Suggested change
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
template:
metadata:
labels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
app.kubernetes.io/name: dummy-inference-vllm
app.kubernetes.io/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: dummy-inference-vllm
app.kubernetes.io/entry: "true"
template:
metadata:
labels:
app.kubernetes.io/name: dummy-inference-vllm
app.kubernetes.io/entry: "true"

Copilot uses AI. Check for mistakes.
Comment on lines +69 to +80
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
template:
metadata:
labels:
modelserving.volcano.sh/name: test-model
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example Deployment is labeled with the same modelserving.volcano.sh/name: test-model key/value that the controller now publishes via status.labelSelector. If someone applies this alongside a real ModelServing named test-model, HPA will likely count these pods as part of the scale target, skewing replica calculations and metrics. Recommend updating the example to avoid reusing the ModelServingNameLabelKey label (or use a different msName value that cannot collide with an actual ModelServing), so it doesn’t interfere with autoscaling behavior.

Suggested change
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
template:
metadata:
labels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/name: dummy-test-model
modelserving.volcano.sh/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
modelserving.volcano.sh/name: dummy-test-model
modelserving.volcano.sh/entry: "true"
template:
metadata:
labels:
modelserving.volcano.sh/name: dummy-test-model

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +26
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: keda-modelserving-scaling
rules:
- apiGroups:
- workload.serving.volcano.sh
resources:
- modelservings
verbs:
- get
- list
- watch
- apiGroups:
- workload.serving.volcano.sh
resources:
- modelservings/scale
verbs:
- get
- update
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: keda-modelserving-scaling
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This grants cluster-wide permissions to read all ModelServing objects and update any modelservings/scale. If this is intended as an example for scaling a single namespace (the provided ScaledObject is in default), prefer a namespaced Role/RoleBinding scoped to that namespace to reduce blast radius. If cluster-wide access is genuinely required, consider documenting that explicitly in-file to prevent accidental over-privileging.

Copilot uses AI. Check for mistakes.
revision: "rev-1",
},
{
name: "name with special characters — selector encodes correctly",
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case description says the selector 'encodes correctly', but the implementation uses labels.Set{...}.String() and does not perform any encoding; it just formats a k=v selector string. Suggest updating the test name to reflect what’s actually being validated (e.g., that a typical DNS-1123-ish name round-trips into the selector string) to avoid misleading future readers.

Suggested change
name: "name with special characters — selector encodes correctly",
name: "name with dashes and numbers — selector string contains name unmodified",

Copilot uses AI. Check for mistakes.
spec:
containers:
- name: kthena-router
image: nginx:alpine
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i understand why deploy nginx ad kthena router

scaleTargetRef:
apiVersion: workload.serving.volcano.sh/v1alpha1
kind: ModelServing
name: test-model
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is test-model?

kind: ModelServing
name: test-model
minReplicaCount: 1
maxReplicaCount: 5
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since you have label selector on all the pods belong the modelserving, but actually we want to scale based on groups. The pod number usually is at least 2X number of serving groups. How do we handle that.

@hzxuzhonghu
Copy link
Copy Markdown
Member

You can place these examples in the ‘example’ folder

I agree with this, can we add a addon dir like istio https://github.com/istio/istio/tree/master/samples/addons

…dback

- Remove loose test-deployment.yaml and scaledobject.yaml from repo root
- Add proper ModelServing example (test-model) with entry + worker roles
- Add ScaledObject with real Prometheus query (kthena_router_active_downstream_requests)
- Add README explaining usage and groups-vs-pods scaling design

Signed-off-by: WHOIM1205 <[email protected]>
@WHOIM1205
Copy link
Copy Markdown
Contributor Author

WHOIM1205 commented Apr 2, 2026

hey @hzxuzhonghu thanks for the feedback

  1. removed the nginx based test deployment and now relying on the actual kthena router image
  2. added a proper example under examples/keda-autoscaling/ including ModelServing, ScaledObject, and README
  3. apdated the ScaledObject to reference a real ModelServing (test-model)
  4. switched to a real router metric (kthena_router_active_downstream_requests) for scaling

regarding the scaling behavior:
ModelServing scales via spec.replicas (serving groups) each group may contain multiple pods but the Prometheus query uses sum() aggregation to produce a single value the threshold is defined at the group level so scaling decisions correctly map to groups rather than individual pods

Please let me know if any further adjustments are needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants