Skip to content

Commit f673ad1

Browse files
author
Mopuri, Bharath
committed
initial changes for IG raw deployment mode
1 parent fc1ba3e commit f673ad1

File tree

5 files changed

+148
-0
lines changed

5 files changed

+148
-0
lines changed

docs/admin/kubernetes_deployment.md

+3
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# Kubernetes Deployment Installation Guide
22
KServe supports `RawDeployment` mode to enable `InferenceService` deployment with Kubernetes resources [`Deployment`](https://kubernetes.io/docs/concepts/workloads/controllers/deployment), [`Service`](https://kubernetes.io/docs/concepts/services-networking/service), [`Ingress`](https://kubernetes.io/docs/concepts/services-networking/ingress) and [`Horizontal Pod Autoscaler`](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale). Comparing to serverless deployment it unlocks Knative limitations such as mounting multiple volumes, on the other hand `Scale down and from Zero` is not supported in `RawDeployment` mode.
33

4+
** Starting with Kserve vx.xx release `InferenceGraph` as well supports `RawDeployment` mode
5+
See release notes
6+
47
Kubernetes 1.22 is the minimally required version and please check the following recommended Istio versions for the corresponding
58
Kubernetes version.
69

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Announcing: KServe vx.xx
2+
3+
We are excited to announce the release of KServe x.xx, in this release we made enhancements to the KServe control plane, especially brining RawDeployment for `InferenceGraph` as well. Previously `RawDeployment` existed only for `InferenceService`
4+
5+
Here is a summary of the key changes:
6+
7+
## KServe Core Inference Enhancements
8+
9+
- Inference Graph enhancements for supporting `RawDeployment` along with Auto Scaling configuration right within the `InferenceGraphSpec`
10+
11+
IG `RawDeployment` makes the deployment light weight using native k8s resources. See the comparison below
12+
13+
![Inference graph Knative based deployment](../../images/2024-xx-xx-Kserve-x-xx-release/ig_knative_deployment.png)
14+
15+
![Inference graph raw deployment](../../images/2024-xx-xx-Kserve-x-xx-release/ig_raw_deployment.png)
16+
17+
AutoScaling configuration fields were introduced to support scaling needs in
18+
`RawDeployment` mode. These fields are optional and when added effective only when this annotation `serving.kserve.io/autoscalerClass` not pointing to `external`
19+
see the following example with Auto scaling fields `MinReplicas`, `MaxReplicas`, `ScaleTarget` and `ScaleMetric`:
20+
21+
```yaml
22+
apiVersion: serving.kserve.io/v1alpha1
23+
kind: InferenceGraph
24+
metadata:
25+
name: graph_with_switch_node
26+
annotations:
27+
serving.kserve.io/deploymentMode: "RawDeployment"
28+
spec:
29+
nodes:
30+
root:
31+
routerType: Sequence
32+
steps:
33+
- name: "rootStep1"
34+
nodeName: node1
35+
dependency: Hard
36+
- name: "rootStep2"
37+
serviceName: {{ success_200_isvc_id }}
38+
node1:
39+
routerType: Switch
40+
steps:
41+
- name: "node1Step1"
42+
serviceName: {{ error_404_isvc_id }}
43+
condition: "[@this].#(decision_picker==ERROR)"
44+
dependency: Hard
45+
MinReplicas: 5
46+
MaxReplicas: 10
47+
ScaleTarget: 50
48+
ScaleMetric: "cpu"
49+
```
50+
For more details please refer to the [issue](https://github.com/kserve/kserve/issues/2454).
51+
52+
-
53+
54+
### Enhanced Python SDK Dependency Management
55+
56+
-
57+
-
58+
59+
### KServe Python Runtimes Improvements
60+
-
61+
62+
### LLM Runtimes
63+
64+
#### TorchServe LLM Runtime
65+
66+
#### vLLM Runtime
67+
68+
## ModelMesh Updates
69+
70+
### Storing Models on Kubernetes Persistent Volumes (PVC)
71+
72+
### Horizontal Pod Autoscaling (HPA)
73+
74+
### Model Metrics, Metrics Dashboard, Payload Event Logging
75+
76+
## What's Changed? :warning:
77+
78+
## Join the community
79+
80+
- Visit our [Website](https://kserve.github.io/website/) or [GitHub](https://github.com/kserve)
81+
- Join the Slack ([#kserve](https://kubeflow.slack.com/?redir=%2Farchives%2FCH6E58LNP))
82+
- Attend our community meeting by subscribing to the [KServe calendar](https://wiki.lfaidata.foundation/display/kserve/calendars).
83+
- View our [community github repository](https://github.com/kserve/community) to learn how to make contributions. We are excited to work with you to make KServe better and promote its adoption!
84+
85+
86+
Thanks for all the contributors who have made the commits to 0.11 release!
87+
88+
The KServe Working Group
Loading
Loading

docs/reference/api.md

+57
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,63 @@ Kubernetes core/v1.Affinity
524524
<em>(Optional)</em>
525525
</td>
526526
</tr>
527+
528+
<tr>
529+
<td>
530+
<code>minReplicas</code><br/>
531+
<em>
532+
int
533+
</em>
534+
</td>
535+
<td>
536+
<em>(Optional)</em>
537+
<p>Minimum number of replicas, defaults to 1 but can be set to 0 to enable scale-to-zero.</p>
538+
</td>
539+
</tr>
540+
<tr>
541+
<td>
542+
<code>maxReplicas</code><br/>
543+
<em>
544+
int
545+
</em>
546+
</td>
547+
<td>
548+
<em>(Optional)</em>
549+
<p>Maximum number of replicas for autoscaling.</p>
550+
</td>
551+
</tr>
552+
<tr>
553+
<td>
554+
<code>scaleTarget</code><br/>
555+
<em>
556+
int
557+
</em>
558+
</td>
559+
<td>
560+
<em>(Optional)</em>
561+
<p>ScaleTarget specifies the integer target value of the metric type the Autoscaler watches for.
562+
concurrency and rps targets are supported by Knative Pod Autoscaler
563+
(<a href="https://knative.dev/docs/serving/autoscaling/autoscaling-targets/">https://knative.dev/docs/serving/autoscaling/autoscaling-targets/</a>).</p>
564+
</td>
565+
</tr>
566+
<tr>
567+
<td>
568+
<code>scaleMetric</code><br/>
569+
<em>
570+
<a href="#serving.kserve.io/v1beta1.ScaleMetric">
571+
ScaleMetric
572+
</a>
573+
</em>
574+
</td>
575+
<td>
576+
<em>(Optional)</em>
577+
<p>ScaleMetric defines the scaling metric type watched by autoscaler
578+
possible values are concurrency, rps, cpu, memory. concurrency, rps are supported via
579+
Knative Pod Autoscaler(<a href="https://knative.dev/docs/serving/autoscaling/autoscaling-metrics">https://knative.dev/docs/serving/autoscaling/autoscaling-metrics</a>).</p>
580+
</td>
581+
</tr>
582+
583+
527584
</tbody>
528585
</table>
529586
<h3 id="serving.kserve.io/v1alpha1.InferenceGraphStatus">InferenceGraphStatus

0 commit comments

Comments
 (0)