Add KEDA autoscaling support for ModelServing via Prometheus#839
Add KEDA autoscaling support for ModelServing via Prometheus#839WHOIM1205 wants to merge 4 commits intovolcano-sh:mainfrom
Conversation
- Populate .status.labelSelector in controller to enable HPA pod discovery - Add ScaledObject targeting ModelServing custom resource - Add RBAC for KEDA to scale ModelServing resources - Add ServiceMonitor and test deployment manifests Signed-off-by: WHOIM1205 <[email protected]>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a critical fix that enables robust autoscaling for Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
|
hey @LiZhenCheng9527 @hzxuzhonghu |
There was a problem hiding this comment.
Code Review
This pull request integrates KEDA for autoscaling ModelServing resources by adding necessary RBAC configurations and updating the ModelServingController to correctly set the LabelSelector in the ModelServing status, enabling HPA and KEDA to identify and scale associated pods. It also includes example YAMLs for a ModelServing instance, a KEDA ScaledObject, a ServiceMonitor, and test deployments with mock metrics. Feedback suggests that the example ScaledObject's Prometheus query should be updated to use metrics actually exposed by the mock deployments (e.g., vllm_num_requests_running), and a Service and ServiceMonitor are missing for the dummy-inference-vllm deployment to allow Prometheus to scrape its metrics, which are crucial for a fully functional autoscaling example.
scaledobject.yaml
Outdated
| query: sum(rate(process_cpu_seconds_total[1m])) | ||
| threshold: "0.01" |
There was a problem hiding this comment.
The Prometheus query uses the metric process_cpu_seconds_total, but the mock deployments in test-deployment.yaml do not expose this metric. They expose kthena_router_* and vllm_* metrics.
To make the example consistent and functional, the query should use one of the available metrics. For example, using vllm_num_requests_running from the dummy-inference-vllm deployment would be more appropriate, as those pods are labeled to be part of the ModelServing instance.
You could change the query to something like this, assuming the goal is to scale when there's at least one running request:
query: sum(vllm_num_requests_running{modelserving_volcano_sh_name="test-model"})
threshold: "1"
test-deployment.yaml
Outdated
| location / { | ||
| return 200 'dummy-vllm ok\n'; | ||
| } | ||
| } |
There was a problem hiding this comment.
The dummy-inference-vllm deployment exposes vllm_* metrics that would be useful for autoscaling, but there is no Service defined to expose its pods for scraping. Without a Service and a corresponding ServiceMonitor, Prometheus will not be able to collect these metrics.
To make this example fully functional, a Service for this deployment should be added. For example:
---
apiVersion: v1
kind: Service
metadata:
name: dummy-inference-vllm
labels:
modelserving.volcano.sh/name: test-model
spec:
selector:
modelserving.volcano.sh/name: test-model
ports:
- name: http-metrics
port: 8000
targetPort: 8000
protocol: TCPA corresponding ServiceMonitor would also be needed to instruct Prometheus to scrape this new service.
There was a problem hiding this comment.
Pull request overview
This PR fixes HPA/KEDA autoscaling for the ModelServing CR by ensuring the controller populates .status.labelSelector, which is required by the CRD scale subresource (selectorpath=.status.labelSelector) so HPA can select the correct pods.
Changes:
- Set
ModelServing.status.labelSelectorto a stable selector matching pods labeled withmodelserving.volcano.sh/name=<modelserving-name>(including the “no ServingGroups yet” early-return path). - Add example manifests for a Prometheus → KEDA → HPA → ModelServing scaling flow (ServiceMonitor, ScaledObject, RBAC, sample ModelServing, and test deployments).
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/model-serving-controller/controller/model_serving_controller.go | Populate .status.labelSelector during status updates to enable HPA scale subresource functionality. |
| test-deployment.yaml | Example Deployments/Services/ConfigMaps to expose dummy /metrics endpoints for local testing. |
| servicemonitor.yaml | Example ServiceMonitor to have Prometheus scrape the dummy metrics endpoint. |
| scaledobject.yaml | Example KEDA ScaledObject targeting a ModelServing resource using Prometheus metrics. |
| modelserving.yaml | Example ModelServing resource used as the KEDA/HPA scaling target. |
| keda-rbac.yaml | Example RBAC to allow KEDA to interact with modelservings scaling APIs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
keda-rbac.yaml
Outdated
| - modelservings/scale | ||
| - modelservings/status | ||
| verbs: | ||
| - get | ||
| - list | ||
| - watch | ||
| - update | ||
| - patch |
There was a problem hiding this comment.
This RBAC grants broad cluster-wide permissions (update/patch) on modelservings and modelservings/status. For KEDA/HPA scaling, the operator typically only needs get on the target and get/update (and possibly patch) on the modelservings/scale subresource. Consider tightening the rules (and using a namespaced Role/RoleBinding if feasible) to follow least-privilege.
| - modelservings/scale | |
| - modelservings/status | |
| verbs: | |
| - get | |
| - list | |
| - watch | |
| - update | |
| - patch | |
| verbs: | |
| - get | |
| - apiGroups: | |
| - workload.serving.volcano.sh | |
| resources: | |
| - modelservings/scale | |
| verbs: | |
| - get | |
| - update |
modelserving.yaml
Outdated
| apiVersion: workload.serving.volcano.sh/v1alpha1 | ||
| kind: ModelServing | ||
| metadata: | ||
| name: test-model | ||
| namespace: default | ||
| spec: | ||
| replicas: 1 | ||
| template: | ||
| roles: | ||
| - name: entry | ||
| workerReplicas: 1 | ||
| entryTemplate: | ||
| metadata: {} | ||
| spec: | ||
| containers: | ||
| - name: entry | ||
| image: nginx | ||
| workerTemplate: | ||
| metadata: {} | ||
| spec: | ||
| containers: | ||
| - name: worker | ||
| image: nginx | ||
|
|
||
| - name: worker | ||
| workerReplicas: 1 | ||
| entryTemplate: | ||
| metadata: {} | ||
| spec: | ||
| containers: | ||
| - name: entry | ||
| image: nginx | ||
| workerTemplate: | ||
| metadata: {} | ||
| spec: | ||
| containers: | ||
| - name: worker | ||
| image: nginx |
There was a problem hiding this comment.
This ModelServing manifest looks like an example asset; the repo already keeps sample CR YAML under examples/model-serving/. Consider moving it there (or a dedicated autoscaling example folder) rather than adding it at repo root.
| apiVersion: workload.serving.volcano.sh/v1alpha1 | |
| kind: ModelServing | |
| metadata: | |
| name: test-model | |
| namespace: default | |
| spec: | |
| replicas: 1 | |
| template: | |
| roles: | |
| - name: entry | |
| workerReplicas: 1 | |
| entryTemplate: | |
| metadata: {} | |
| spec: | |
| containers: | |
| - name: entry | |
| image: nginx | |
| workerTemplate: | |
| metadata: {} | |
| spec: | |
| containers: | |
| - name: worker | |
| image: nginx | |
| - name: worker | |
| workerReplicas: 1 | |
| entryTemplate: | |
| metadata: {} | |
| spec: | |
| containers: | |
| - name: entry | |
| image: nginx | |
| workerTemplate: | |
| metadata: {} | |
| spec: | |
| containers: | |
| - name: worker | |
| image: nginx | |
| # This file previously contained an example ModelServing manifest. | |
| # To follow the repository convention and avoid cluttering the repo root, | |
| # the actual example CR YAML has been moved under the examples directory. | |
| # | |
| # Please use the manifest in: | |
| # examples/model-serving/modelserving.yaml | |
| # (or the appropriate subfolder, e.g. examples/model-serving/autoscaling/) | |
| # | |
| # This stub is intentionally left without a Kubernetes resource definition. | |
| # It exists only to document the relocation and to satisfy static analysis. |
scaledobject.yaml
Outdated
| apiVersion: keda.sh/v1alpha1 | ||
| kind: ScaledObject | ||
| metadata: | ||
| name: modelserving-scaler | ||
| namespace: default | ||
| spec: | ||
| scaleTargetRef: | ||
| apiVersion: workload.serving.volcano.sh/v1alpha1 | ||
| kind: ModelServing | ||
| name: test-model | ||
| minReplicaCount: 1 | ||
| maxReplicaCount: 5 | ||
| pollingInterval: 15 | ||
| cooldownPeriod: 60 | ||
| triggers: | ||
| - type: prometheus | ||
| metadata: | ||
| serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090 | ||
| query: sum(rate(process_cpu_seconds_total[1m])) | ||
| threshold: "0.01" |
There was a problem hiding this comment.
This ScaledObject manifest appears to be an example asset; to align with the repo’s existing structure for sample YAML, consider moving it under examples/ (e.g., examples/autoscaling/keda/) rather than adding it at repository root.
| apiVersion: keda.sh/v1alpha1 | |
| kind: ScaledObject | |
| metadata: | |
| name: modelserving-scaler | |
| namespace: default | |
| spec: | |
| scaleTargetRef: | |
| apiVersion: workload.serving.volcano.sh/v1alpha1 | |
| kind: ModelServing | |
| name: test-model | |
| minReplicaCount: 1 | |
| maxReplicaCount: 5 | |
| pollingInterval: 15 | |
| cooldownPeriod: 60 | |
| triggers: | |
| - type: prometheus | |
| metadata: | |
| serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090 | |
| query: sum(rate(process_cpu_seconds_total[1m])) | |
| threshold: "0.01" | |
| # Placeholder file at repository root. | |
| # The example KEDA ScaledObject manifest has been moved under the examples tree, | |
| # for example: examples/autoscaling/keda/scaledobject.yaml | |
| # | |
| # This file is intentionally left without any Kubernetes resources to avoid | |
| # having example manifests at the repository root. |
| apiVersion: rbac.authorization.k8s.io/v1 | ||
| kind: ClusterRole | ||
| metadata: | ||
| name: keda-modelserving-scaling | ||
| rules: |
There was a problem hiding this comment.
This RBAC manifest also appears to be an example asset; consider relocating it under examples//docs/ so users can find it alongside other sample manifests instead of at the repository root.
| // Set labelSelector so the scale subresource can report it to HPA. | ||
| // Without this, HPA fails with "selector is required" because it cannot | ||
| // determine which pods belong to this ModelServing. | ||
| // The selector matches the label applied to all pods by createBasePod(). | ||
| selector := labels.Set{ | ||
| workloadv1alpha1.ModelServingNameLabelKey: latestMS.Name, | ||
| }.String() | ||
| if copy.Status.LabelSelector != selector { | ||
| shouldUpdate = true | ||
| copy.Status.LabelSelector = selector | ||
| } |
There was a problem hiding this comment.
Add/adjust a unit test to assert that UpdateModelServingStatus sets .status.labelSelector (both in the no-serving-groups early return and the normal status update path). This behavior is critical for the CRD scale subresource/HPA integration and is currently unverified by tests.
test-deployment.yaml
Outdated
| apiVersion: apps/v1 | ||
| kind: Deployment | ||
| metadata: | ||
| name: kthena-router | ||
| labels: |
There was a problem hiding this comment.
These manifests appear to be example/test assets, but they are being added at repo root even though the repository already has an examples/ tree for custom resource YAML. Consider moving this file under an appropriate examples/ subdirectory (e.g., examples/model-serving/ or a new examples/autoscaling/keda/) to keep the top-level clean and make discovery consistent.
| apiVersion: monitoring.coreos.com/v1 | ||
| kind: ServiceMonitor | ||
| metadata: | ||
| name: kthena-router | ||
| namespace: monitoring | ||
| spec: |
There was a problem hiding this comment.
This ServiceMonitor looks like an example asset; the repo already centralizes sample manifests under examples/. Consider relocating it under examples/ (and/or docs/) instead of adding it to the repository root.
scaledobject.yaml
Outdated
| - type: prometheus | ||
| metadata: | ||
| serverAddress: http://prometheus-kube-prometheus-prometheus.monitoring.svc.cluster.local:9090 | ||
| query: sum(rate(process_cpu_seconds_total[1m])) |
There was a problem hiding this comment.
The Prometheus query sum(rate(process_cpu_seconds_total[1m])) is effectively cluster-wide and not scoped to the ModelServing/pods being autoscaled, so it can cause unintended scaling driven by unrelated workloads. Consider scoping the query to the target workload (e.g., via pod, namespace, or a label like modelserving.volcano.sh/name=test-model) or using a workload-specific metric.
| query: sum(rate(process_cpu_seconds_total[1m])) | |
| query: sum(rate(process_cpu_seconds_total{namespace="default", pod=~"test-model-.*"}[1m])) |
LiZhenCheng9527
left a comment
There was a problem hiding this comment.
You can place these examples in the ‘example’ folder
| - modelservings | ||
| - modelservings/scale | ||
| - modelservings/status | ||
| verbs: |
There was a problem hiding this comment.
Do modelserving really needs all these permissions?
modelserving.yaml
Outdated
| @@ -0,0 +1,38 @@ | |||
| apiVersion: workload.serving.volcano.sh/v1alpha1 | |||
There was a problem hiding this comment.
Why is such an example needed? The one in my directory is identical to examples/model-serving/sample.yaml.
- Remove modelservings/status from KEDA ClusterRole (not needed) - Restrict modelservings base resource to read-only verbs - Delete redundant modelserving.yaml test file Signed-off-by: WHOIM1205 <[email protected]>
|
hey @LiZhenCheng9527
Please let me know if anything else should be adjusted. |
|
I did a first pass on this diff. Setting Main thing I would still want to confirm is normal CI plus at least one regression test around the status update path for both cases:
|
Signed-off-by: WHOIM1205 <[email protected]>
|
hey @hzxuzhonghu added regression tests for the status update path as suggested:
All tests are passing. Please let me know if you'd like any additional coverage.
|
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test-deployment.yaml
Outdated
| location /metrics { | ||
| default_type text/plain; | ||
| return 200 '# HELP kthena_router_active_downstream_requests Number of active downstream requests\n# TYPE kthena_router_active_downstream_requests gauge\nkthena_router_active_downstream_requests 3\n# HELP kthena_router_requests_total Total requests\n# TYPE kthena_router_requests_total counter\nkthena_router_requests_total 100\n'; | ||
| } |
There was a problem hiding this comment.
The NGINX return 200 '...\\n...' payload will emit literal backslash-n sequences (NGINX doesn’t interpret \\n escapes here), which can make the Prometheus exposition invalid/unparseable. To make these manifests reliably scrapeable, serve metrics with real newlines (e.g., by returning a multi-line literal string with actual newline characters, serving a static metrics file, or using a tiny HTTP server/exporter image that emits valid Prometheus text format).
test-deployment.yaml
Outdated
| location /metrics { | ||
| default_type text/plain; | ||
| return 200 '# HELP vllm_num_requests_running Number of running requests\n# TYPE vllm_num_requests_running gauge\nvllm_num_requests_running 2\n# HELP vllm_num_requests_waiting Number of waiting requests\n# TYPE vllm_num_requests_waiting gauge\nvllm_num_requests_waiting 0\n# HELP vllm_gpu_cache_usage_perc GPU cache usage percentage\n# TYPE vllm_gpu_cache_usage_perc gauge\nvllm_gpu_cache_usage_perc 0.45\n'; | ||
| } |
There was a problem hiding this comment.
The NGINX return 200 '...\\n...' payload will emit literal backslash-n sequences (NGINX doesn’t interpret \\n escapes here), which can make the Prometheus exposition invalid/unparseable. To make these manifests reliably scrapeable, serve metrics with real newlines (e.g., by returning a multi-line literal string with actual newline characters, serving a static metrics file, or using a tiny HTTP server/exporter image that emits valid Prometheus text format).
test-deployment.yaml
Outdated
| modelserving.volcano.sh/name: test-model | ||
| modelserving.volcano.sh/entry: "true" | ||
| spec: | ||
| replicas: 1 | ||
| selector: | ||
| matchLabels: | ||
| modelserving.volcano.sh/name: test-model | ||
| modelserving.volcano.sh/entry: "true" | ||
| template: | ||
| metadata: | ||
| labels: | ||
| modelserving.volcano.sh/name: test-model | ||
| modelserving.volcano.sh/entry: "true" |
There was a problem hiding this comment.
This example Deployment is labeled with the same modelserving.volcano.sh/name: test-model key/value that the controller now publishes via status.labelSelector. If someone applies this alongside a real ModelServing named test-model, HPA will likely count these pods as part of the scale target, skewing replica calculations and metrics. Recommend updating the example to avoid reusing the ModelServingNameLabelKey label (or use a different msName value that cannot collide with an actual ModelServing), so it doesn’t interfere with autoscaling behavior.
| modelserving.volcano.sh/name: test-model | |
| modelserving.volcano.sh/entry: "true" | |
| spec: | |
| replicas: 1 | |
| selector: | |
| matchLabels: | |
| modelserving.volcano.sh/name: test-model | |
| modelserving.volcano.sh/entry: "true" | |
| template: | |
| metadata: | |
| labels: | |
| modelserving.volcano.sh/name: test-model | |
| modelserving.volcano.sh/entry: "true" | |
| app.kubernetes.io/name: dummy-inference-vllm | |
| app.kubernetes.io/entry: "true" | |
| spec: | |
| replicas: 1 | |
| selector: | |
| matchLabels: | |
| app.kubernetes.io/name: dummy-inference-vllm | |
| app.kubernetes.io/entry: "true" | |
| template: | |
| metadata: | |
| labels: | |
| app.kubernetes.io/name: dummy-inference-vllm | |
| app.kubernetes.io/entry: "true" |
test-deployment.yaml
Outdated
| modelserving.volcano.sh/name: test-model | ||
| modelserving.volcano.sh/entry: "true" | ||
| spec: | ||
| replicas: 1 | ||
| selector: | ||
| matchLabels: | ||
| modelserving.volcano.sh/name: test-model | ||
| modelserving.volcano.sh/entry: "true" | ||
| template: | ||
| metadata: | ||
| labels: | ||
| modelserving.volcano.sh/name: test-model |
There was a problem hiding this comment.
This example Deployment is labeled with the same modelserving.volcano.sh/name: test-model key/value that the controller now publishes via status.labelSelector. If someone applies this alongside a real ModelServing named test-model, HPA will likely count these pods as part of the scale target, skewing replica calculations and metrics. Recommend updating the example to avoid reusing the ModelServingNameLabelKey label (or use a different msName value that cannot collide with an actual ModelServing), so it doesn’t interfere with autoscaling behavior.
| modelserving.volcano.sh/name: test-model | |
| modelserving.volcano.sh/entry: "true" | |
| spec: | |
| replicas: 1 | |
| selector: | |
| matchLabels: | |
| modelserving.volcano.sh/name: test-model | |
| modelserving.volcano.sh/entry: "true" | |
| template: | |
| metadata: | |
| labels: | |
| modelserving.volcano.sh/name: test-model | |
| modelserving.volcano.sh/name: dummy-test-model | |
| modelserving.volcano.sh/entry: "true" | |
| spec: | |
| replicas: 1 | |
| selector: | |
| matchLabels: | |
| modelserving.volcano.sh/name: dummy-test-model | |
| modelserving.volcano.sh/entry: "true" | |
| template: | |
| metadata: | |
| labels: | |
| modelserving.volcano.sh/name: dummy-test-model |
| apiVersion: rbac.authorization.k8s.io/v1 | ||
| kind: ClusterRole | ||
| metadata: | ||
| name: keda-modelserving-scaling | ||
| rules: | ||
| - apiGroups: | ||
| - workload.serving.volcano.sh | ||
| resources: | ||
| - modelservings | ||
| verbs: | ||
| - get | ||
| - list | ||
| - watch | ||
| - apiGroups: | ||
| - workload.serving.volcano.sh | ||
| resources: | ||
| - modelservings/scale | ||
| verbs: | ||
| - get | ||
| - update | ||
| - patch | ||
| --- | ||
| apiVersion: rbac.authorization.k8s.io/v1 | ||
| kind: ClusterRoleBinding | ||
| metadata: | ||
| name: keda-modelserving-scaling |
There was a problem hiding this comment.
This grants cluster-wide permissions to read all ModelServing objects and update any modelservings/scale. If this is intended as an example for scaling a single namespace (the provided ScaledObject is in default), prefer a namespaced Role/RoleBinding scoped to that namespace to reduce blast radius. If cluster-wide access is genuinely required, consider documenting that explicitly in-file to prevent accidental over-privileging.
| revision: "rev-1", | ||
| }, | ||
| { | ||
| name: "name with special characters — selector encodes correctly", |
There was a problem hiding this comment.
The test case description says the selector 'encodes correctly', but the implementation uses labels.Set{...}.String() and does not perform any encoding; it just formats a k=v selector string. Suggest updating the test name to reflect what’s actually being validated (e.g., that a typical DNS-1123-ish name round-trips into the selector string) to avoid misleading future readers.
| name: "name with special characters — selector encodes correctly", | |
| name: "name with dashes and numbers — selector string contains name unmodified", |
test-deployment.yaml
Outdated
| spec: | ||
| containers: | ||
| - name: kthena-router | ||
| image: nginx:alpine |
There was a problem hiding this comment.
not sure i understand why deploy nginx ad kthena router
| scaleTargetRef: | ||
| apiVersion: workload.serving.volcano.sh/v1alpha1 | ||
| kind: ModelServing | ||
| name: test-model |
| kind: ModelServing | ||
| name: test-model | ||
| minReplicaCount: 1 | ||
| maxReplicaCount: 5 |
There was a problem hiding this comment.
since you have label selector on all the pods belong the modelserving, but actually we want to scale based on groups. The pod number usually is at least 2X number of serving groups. How do we handle that.
I agree with this, can we add a addon dir like istio https://github.com/istio/istio/tree/master/samples/addons |
…dback - Remove loose test-deployment.yaml and scaledobject.yaml from repo root - Add proper ModelServing example (test-model) with entry + worker roles - Add ScaledObject with real Prometheus query (kthena_router_active_downstream_requests) - Add README explaining usage and groups-vs-pods scaling design Signed-off-by: WHOIM1205 <[email protected]>
|
hey @hzxuzhonghu thanks for the feedback
regarding the scaling behavior: Please let me know if any further adjustments are needed. |

Summary
This PR fixes HPA compatibility for the
ModelServingcustom resource.While trying to use KEDA + HPA for autoscaling, I found that scaling wasn’t working due to a missing label selector in the controller. This change adds that missing piece and makes autoscaling work end-to-end.
Problem
Autoscaling via KEDA → HPA was failing with:
After digging into it, the issue was:
labelSelectorPath: .status.labelSelectorspecReplicasPath: .spec.replicasstatusReplicasPath: .status.replicas.status.labelSelectorwas never being set by the controllerBecause of that, HPA couldn’t figure out which pods belong to a
ModelServinginstance.Fix
Set
.status.labelSelectorin the controller using the existing label:This label is already applied to all pods created by
ModelServing, so HPA can use it directly.Handled in:
The update is idempotent (only sets the value if needed).
Result
After this change:
spec.replicas(no Deployment involved)Testing
Tested locally:
sum(rate(process_cpu_seconds_total[1m])) > 0)ScalingActive=TrueModelServing.spec.replicasincreases under loadSo the full flow works:
Prometheus → KEDA → HPA → ModelServing → pods
Notes
Impact
ModelServingrefs : #799