Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions keda-rbac.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: keda-modelserving-scaling
rules:
Comment on lines +1 to +5
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This RBAC manifest also appears to be an example asset; consider relocating it under examples//docs/ so users can find it alongside other sample manifests instead of at the repository root.

Copilot uses AI. Check for mistakes.
- apiGroups:
- workload.serving.volcano.sh
resources:
- modelservings
verbs:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do modelserving really needs all these permissions?

- get
- list
- watch
- apiGroups:
- workload.serving.volcano.sh
resources:
- modelservings/scale
verbs:
- get
- update
- patch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: keda-modelserving-scaling
Comment on lines +1 to +26
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This grants cluster-wide permissions to read all ModelServing objects and update any modelservings/scale. If this is intended as an example for scaling a single namespace (the provided ScaledObject is in default), prefer a namespaced Role/RoleBinding scoped to that namespace to reduce blast radius. If cluster-wide access is genuinely required, consider documenting that explicitly in-file to prevent accidental over-privileging.

Copilot uses AI. Check for mistakes.
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: keda-modelserving-scaling
subjects:
- kind: ServiceAccount
name: keda-operator
namespace: keda
Original file line number Diff line number Diff line change
Expand Up @@ -1617,9 +1617,14 @@ func (c *ModelServingController) UpdateModelServingStatus(ms *workloadv1alpha1.M
// If no groups exist, handle gracefully by setting revisions to the new revision
if errors.Is(err, datastore.ErrServingGroupNotFound) {
copy := latestMS.DeepCopy()
if copy.Status.CurrentRevision != revision || copy.Status.UpdateRevision != revision {
selector := labels.Set{
workloadv1alpha1.ModelServingNameLabelKey: latestMS.Name,
}.String()
needsUpdate := copy.Status.CurrentRevision != revision || copy.Status.UpdateRevision != revision || copy.Status.LabelSelector != selector
if needsUpdate {
copy.Status.CurrentRevision = revision
copy.Status.UpdateRevision = revision
copy.Status.LabelSelector = selector
_, updateErr := c.modelServingClient.WorkloadV1alpha1().ModelServings(copy.GetNamespace()).UpdateStatus(context.TODO(), copy, metav1.UpdateOptions{})
return updateErr
}
Expand Down Expand Up @@ -1745,6 +1750,18 @@ func (c *ModelServingController) UpdateModelServingStatus(ms *workloadv1alpha1.M
copy.Status.ObservedGeneration = latestMS.Generation
}

// Set labelSelector so the scale subresource can report it to HPA.
// Without this, HPA fails with "selector is required" because it cannot
// determine which pods belong to this ModelServing.
// The selector matches the label applied to all pods by createBasePod().
selector := labels.Set{
workloadv1alpha1.ModelServingNameLabelKey: latestMS.Name,
}.String()
if copy.Status.LabelSelector != selector {
shouldUpdate = true
copy.Status.LabelSelector = selector
}
Comment on lines +1753 to +1763
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add/adjust a unit test to assert that UpdateModelServingStatus sets .status.labelSelector (both in the no-serving-groups early return and the normal status update path). This behavior is critical for the CRD scale subresource/HPA integration and is currently unverified by tests.

Copilot uses AI. Check for mistakes.

if shouldUpdate {
_, err := c.modelServingClient.WorkloadV1alpha1().ModelServings(copy.GetNamespace()).UpdateStatus(context.TODO(), copy, metav1.UpdateOptions{})
if err != nil {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3431,6 +3431,109 @@ func TestScaleUpServingGroups_TemplateRecovery(t *testing.T) {

// TestUpdateModelServingStatusRevisionFields tests the CurrentRevision and UpdateRevision logic
// following StatefulSet's behavior
func TestUpdateModelServingStatusLabelSelector(t *testing.T) {
tests := []struct {
name string
msName string
existingGroups map[int]string // ordinal -> revision; nil means no groups (ErrServingGroupNotFound path)
revision string
}{
{
name: "no ServingGroups yet — labelSelector is set on empty status",
msName: "my-llm",
existingGroups: nil,
revision: "rev-1",
},
{
name: "existing ServingGroups — labelSelector is set consistently",
msName: "my-llm",
existingGroups: map[int]string{
0: "rev-1",
1: "rev-1",
},
revision: "rev-1",
},
{
name: "name with special characters — selector encodes correctly",
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test case description says the selector 'encodes correctly', but the implementation uses labels.Set{...}.String() and does not perform any encoding; it just formats a k=v selector string. Suggest updating the test name to reflect what’s actually being validated (e.g., that a typical DNS-1123-ish name round-trips into the selector string) to avoid misleading future readers.

Suggested change
name: "name with special characters — selector encodes correctly",
name: "name with dashes and numbers — selector string contains name unmodified",

Copilot uses AI. Check for mistakes.
msName: "serving-gpt-4o-mini",
existingGroups: map[int]string{
0: "rev-abc",
},
revision: "rev-abc",
},
}

for idx, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
kubeClient := kubefake.NewSimpleClientset()
kthenaClient := kthenafake.NewSimpleClientset()
volcanoClient := volcanofake.NewSimpleClientset()
apiextClient := apiextfake.NewSimpleClientset()

controller, err := NewModelServingController(kubeClient, kthenaClient, volcanoClient, apiextClient)
assert.NoError(t, err)

replicas := int32(len(tt.existingGroups))
if tt.existingGroups == nil {
replicas = 1
}
ms := &workloadv1alpha1.ModelServing{
ObjectMeta: metav1.ObjectMeta{
Namespace: "default",
Name: tt.msName,
},
Spec: workloadv1alpha1.ModelServingSpec{
Replicas: ptr.To(replicas),
SchedulerName: "volcano",
Template: workloadv1alpha1.ServingGroup{
Roles: []workloadv1alpha1.Role{
{
Name: "prefill",
Replicas: ptr.To[int32](1),
EntryTemplate: workloadv1alpha1.PodTemplateSpec{
Spec: corev1.PodSpec{
Containers: []corev1.Container{
{Name: "c", Image: "img:latest"},
},
},
},
},
},
},
RecoveryPolicy: workloadv1alpha1.RoleRecreate,
},
}

_, err = kthenaClient.WorkloadV1alpha1().ModelServings("default").Create(context.Background(), ms, metav1.CreateOptions{})
assert.NoError(t, err)
err = controller.modelServingsInformer.GetIndexer().Add(ms)
assert.NoError(t, err)

// Populate store only when groups exist; nil means the "not found" path.
if tt.existingGroups != nil {
for ordinal, rev := range tt.existingGroups {
controller.store.AddServingGroup(utils.GetNamespaceName(ms), ordinal, rev)
groupName := utils.GenerateServingGroupName(tt.msName, ordinal)
controller.store.UpdateServingGroupStatus(utils.GetNamespaceName(ms), groupName, datastore.ServingGroupRunning)
}
}

err = controller.UpdateModelServingStatus(ms, tt.revision)
assert.NoError(t, err, "case %d: UpdateModelServingStatus should not error", idx)

updated, err := kthenaClient.WorkloadV1alpha1().ModelServings("default").Get(context.Background(), tt.msName, metav1.GetOptions{})
assert.NoError(t, err)

expectedSelector := labels.Set{
workloadv1alpha1.ModelServingNameLabelKey: tt.msName,
}.String()

assert.Equal(t, expectedSelector, updated.Status.LabelSelector,
"case %d: status.labelSelector must be %q", idx, expectedSelector)
})
}
}

func TestUpdateModelServingStatusRevisionFields(t *testing.T) {
tests := []struct {
name string
Expand Down
20 changes: 20 additions & 0 deletions scaledobject.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: modelserving-scaler
namespace: default
spec:
scaleTargetRef:
apiVersion: workload.serving.volcano.sh/v1alpha1
kind: ModelServing
name: test-model
minReplicaCount: 1
maxReplicaCount: 5
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
query: sum(rate(process_cpu_seconds_total[1m]))
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Prometheus query sum(rate(process_cpu_seconds_total[1m])) is effectively cluster-wide and not scoped to the ModelServing/pods being autoscaled, so it can cause unintended scaling driven by unrelated workloads. Consider scoping the query to the target workload (e.g., via pod, namespace, or a label like modelserving.volcano.sh/name=test-model) or using a workload-specific metric.

Suggested change
query: sum(rate(process_cpu_seconds_total[1m]))
query: sum(rate(process_cpu_seconds_total{namespace="default", pod=~"test-model-.*"}[1m]))

Copilot uses AI. Check for mistakes.
threshold: "0.01"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Prometheus query uses the metric process_cpu_seconds_total, but the mock deployments in test-deployment.yaml do not expose this metric. They expose kthena_router_* and vllm_* metrics.

To make the example consistent and functional, the query should use one of the available metrics. For example, using vllm_num_requests_running from the dummy-inference-vllm deployment would be more appropriate, as those pods are labeled to be part of the ModelServing instance.

You could change the query to something like this, assuming the goal is to scale when there's at least one running request:

        query: sum(vllm_num_requests_running{modelserving_volcano_sh_name="test-model"})
        threshold: "1"

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ScaledObject manifest appears to be an example asset; to align with the repo’s existing structure for sample YAML, consider moving it under examples/ (e.g., examples/autoscaling/keda/) rather than adding it at repository root.

Suggested change
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: modelserving-scaler
namespace: default
spec:
scaleTargetRef:
apiVersion: workload.serving.volcano.sh/v1alpha1
kind: ModelServing
name: test-model
minReplicaCount: 1
maxReplicaCount: 5
pollingInterval: 15
cooldownPeriod: 60
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090
query: sum(rate(process_cpu_seconds_total[1m]))
threshold: "0.01"
# Placeholder file at repository root.
# The example KEDA ScaledObject manifest has been moved under the examples tree,
# for example: examples/autoscaling/keda/scaledobject.yaml
#
# This file is intentionally left without any Kubernetes resources to avoid
# having example manifests at the repository root.

Copilot uses AI. Check for mistakes.
16 changes: 16 additions & 0 deletions servicemonitor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: kthena-router
namespace: monitoring
spec:
Comment on lines +1 to +6
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ServiceMonitor looks like an example asset; the repo already centralizes sample manifests under examples/. Consider relocating it under examples/ (and/or docs/) instead of adding it to the repository root.

Copilot uses AI. Check for mistakes.
namespaceSelector:
matchNames:
- default
selector:
matchLabels:
app.kubernetes.io/component: kthena-router
endpoints:
- port: http
path: /metrics
interval: 15s
113 changes: 113 additions & 0 deletions test-deployment.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
apiVersion: apps/v1
kind: Deployment
metadata:
name: kthena-router
labels:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These manifests appear to be example/test assets, but they are being added at repo root even though the repository already has an examples/ tree for custom resource YAML. Consider moving this file under an appropriate examples/ subdirectory (e.g., examples/model-serving/ or a new examples/autoscaling/keda/) to keep the top-level clean and make discovery consistent.

Copilot uses AI. Check for mistakes.
app.kubernetes.io/component: kthena-router
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/component: kthena-router
template:
metadata:
labels:
app.kubernetes.io/component: kthena-router
spec:
containers:
- name: kthena-router
image: nginx:alpine
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure i understand why deploy nginx ad kthena router

ports:
- containerPort: 8080
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/conf.d
volumes:
- name: nginx-config
configMap:
name: kthena-router-nginx-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: kthena-router-nginx-config
data:
default.conf: |
server {
listen 8080;

location /metrics {
default_type text/plain;
return 200 '# HELP kthena_router_active_downstream_requests Number of active downstream requests\n# TYPE kthena_router_active_downstream_requests gauge\nkthena_router_active_downstream_requests 3\n# HELP kthena_router_requests_total Total requests\n# TYPE kthena_router_requests_total counter\nkthena_router_requests_total 100\n';
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NGINX return 200 '...\\n...' payload will emit literal backslash-n sequences (NGINX doesn’t interpret \\n escapes here), which can make the Prometheus exposition invalid/unparseable. To make these manifests reliably scrapeable, serve metrics with real newlines (e.g., by returning a multi-line literal string with actual newline characters, serving a static metrics file, or using a tiny HTTP server/exporter image that emits valid Prometheus text format).

Copilot uses AI. Check for mistakes.

location / {
return 200 'kthena-router ok\n';
}
}
---
apiVersion: v1
kind: Service
metadata:
name: kthena-router
labels:
app.kubernetes.io/component: kthena-router
spec:
selector:
app.kubernetes.io/component: kthena-router
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dummy-inference-vllm
labels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
template:
metadata:
labels:
modelserving.volcano.sh/name: test-model
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example Deployment is labeled with the same modelserving.volcano.sh/name: test-model key/value that the controller now publishes via status.labelSelector. If someone applies this alongside a real ModelServing named test-model, HPA will likely count these pods as part of the scale target, skewing replica calculations and metrics. Recommend updating the example to avoid reusing the ModelServingNameLabelKey label (or use a different msName value that cannot collide with an actual ModelServing), so it doesn’t interfere with autoscaling behavior.

Suggested change
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
template:
metadata:
labels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/name: dummy-test-model
modelserving.volcano.sh/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
modelserving.volcano.sh/name: dummy-test-model
modelserving.volcano.sh/entry: "true"
template:
metadata:
labels:
modelserving.volcano.sh/name: dummy-test-model

Copilot uses AI. Check for mistakes.
modelserving.volcano.sh/entry: "true"
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example Deployment is labeled with the same modelserving.volcano.sh/name: test-model key/value that the controller now publishes via status.labelSelector. If someone applies this alongside a real ModelServing named test-model, HPA will likely count these pods as part of the scale target, skewing replica calculations and metrics. Recommend updating the example to avoid reusing the ModelServingNameLabelKey label (or use a different msName value that cannot collide with an actual ModelServing), so it doesn’t interfere with autoscaling behavior.

Suggested change
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
template:
metadata:
labels:
modelserving.volcano.sh/name: test-model
modelserving.volcano.sh/entry: "true"
app.kubernetes.io/name: dummy-inference-vllm
app.kubernetes.io/entry: "true"
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: dummy-inference-vllm
app.kubernetes.io/entry: "true"
template:
metadata:
labels:
app.kubernetes.io/name: dummy-inference-vllm
app.kubernetes.io/entry: "true"

Copilot uses AI. Check for mistakes.
spec:
containers:
- name: dummy-vllm
image: nginx:alpine
ports:
- containerPort: 8000
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/conf.d
volumes:
- name: nginx-config
configMap:
name: dummy-vllm-nginx-config
---
apiVersion: v1
kind: ConfigMap
metadata:
name: dummy-vllm-nginx-config
data:
default.conf: |
server {
listen 8000;

location /metrics {
default_type text/plain;
return 200 '# HELP vllm_num_requests_running Number of running requests\n# TYPE vllm_num_requests_running gauge\nvllm_num_requests_running 2\n# HELP vllm_num_requests_waiting Number of waiting requests\n# TYPE vllm_num_requests_waiting gauge\nvllm_num_requests_waiting 0\n# HELP vllm_gpu_cache_usage_perc GPU cache usage percentage\n# TYPE vllm_gpu_cache_usage_perc gauge\nvllm_gpu_cache_usage_perc 0.45\n';
}
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The NGINX return 200 '...\\n...' payload will emit literal backslash-n sequences (NGINX doesn’t interpret \\n escapes here), which can make the Prometheus exposition invalid/unparseable. To make these manifests reliably scrapeable, serve metrics with real newlines (e.g., by returning a multi-line literal string with actual newline characters, serving a static metrics file, or using a tiny HTTP server/exporter image that emits valid Prometheus text format).

Copilot uses AI. Check for mistakes.

location / {
return 200 'dummy-vllm ok\n';
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The dummy-inference-vllm deployment exposes vllm_* metrics that would be useful for autoscaling, but there is no Service defined to expose its pods for scraping. Without a Service and a corresponding ServiceMonitor, Prometheus will not be able to collect these metrics.

To make this example fully functional, a Service for this deployment should be added. For example:

---
apiVersion: v1
kind: Service
metadata:
  name: dummy-inference-vllm
  labels:
    modelserving.volcano.sh/name: test-model
spec:
  selector:
    modelserving.volcano.sh/name: test-model
  ports:
    - name: http-metrics
      port: 8000
      targetPort: 8000
      protocol: TCP

A corresponding ServiceMonitor would also be needed to instruct Prometheus to scrape this new service.

Loading