Skip to content

Commit 637055b

Browse files
authored
Docs: added migration guide (#1558)
* added migration guide * updated index * changed typo * updated the index Signed-off-by: Xiyue Yu <[email protected]> * updated docs * fixed comments --------- Signed-off-by: Xiyue Yu <[email protected]>
1 parent bd614af commit 637055b

File tree

5 files changed

+275
-0
lines changed

5 files changed

+275
-0
lines changed

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,7 @@ nav:
7474
- Configuration Guide:
7575
- Configuring the plugins via configuration YAML file: guides/epp-configuration/config-text.md
7676
- Prefix Cache Aware Plugin: guides/epp-configuration/prefix-aware.md
77+
- Migration Guide: guides/ga-migration.md
7778
- Troubleshooting Guide: guides/troubleshooting.md
7879
- Implementer Guides:
7980
- Getting started: guides/implementers.md

site-src/guides/ga-migration.md

Lines changed: 274 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,274 @@
1+
# Inference Gateway: Migrating from v1alpha2 to v1 API
2+
3+
## Introduction
4+
5+
This guide provides a comprehensive walkthrough for migrating your Inference Gateway setup from the alpha `v1alpha2` API to the generally available `v1` API.
6+
This document is intended for platform administrators and networking specialists
7+
who are currently using the `v1alpha2` version of the Inference Gateway and
8+
want to upgrade to the `v1` version to leverage the latest features and improvements.
9+
10+
Before you start the migration, ensure you are familiar with the concepts and deployment of the Inference Gateway.
11+
12+
***
13+
14+
## Before you begin
15+
16+
Before starting the migration, it's important to determine if this guide is necessary for your setup.
17+
18+
### Checking for Existing v1alpha2 APIs
19+
20+
To check if you are actively using the `v1alpha2` Inference Gateway APIs, run the following command:
21+
22+
```bash
23+
kubectl get inferencepools.inference.networking.x-k8s.io --all-namespaces
24+
```
25+
26+
* If this command returns one or more `InferencePool` resources, you are using the `v1alpha2` API and should proceed with this migration guide.
27+
* If the command returns `No resources found`, you are not using the `v1alpha2` `InferencePool` and do not need to follow this migration guide. You can proceed with a fresh installation of the `v1` Inference Gateway.
28+
29+
***
30+
31+
## Migration Paths
32+
33+
There are two paths for migrating from `v1alpha2` to `v1`:
34+
35+
1. **Simple Migration (with downtime):** This path is for users who can afford a short period of downtime. It involves deleting the old `v1alpha2` resources and CRDs before installing the new `v1` versions.
36+
2. **Zero-Downtime Migration:** This path is for users who need to migrate without any service interruption. It involves running both `v1alpha2` and `v1` stacks side-by-side and gradually shifting traffic.
37+
38+
***
39+
40+
## Simple Migration (with downtime)
41+
42+
This approach is faster and simpler but will result in a brief period of downtime while the resources are being updated. It is the recommended path if you do not require a zero-downtime migration.
43+
44+
### 1. Delete Existing v1alpha2 Resources
45+
46+
**Option a: Uninstall using Helm.**
47+
48+
```bash
49+
helm uninstall <helm_alpha_inferencepool_name>
50+
```
51+
52+
**Option b: Manually delete alpha `InferencePool` resources.**
53+
54+
If you are not using Helm, you will need to manually delete all resources associated with your `v1alpha2` deployment. The key is to remove the `HTTPRoute`'s reference to the old `InferencePool` and then delete the `v1alpha2` resources themselves.
55+
56+
1. **Update or Delete the `HTTPRoute`**: Modify the `HTTPRoute` to remove the `backendRef` that points to the `v1alpha2` `InferencePool`.
57+
2. **Delete the `InferencePool` and associated resources**: You must delete the `v1alpha2` `InferencePool`, any `InferenceModel` (or 'InferenceObjective') resources that point to it, and the corresponding Endpoint Picker (EPP) Deployment and Service.
58+
3. **Delete the `v1alpha2` CRDs**: Once all `v1alpha2` custom resources are deleted, you can remove the CRD definitions from your cluster.
59+
```bash
60+
# You can change the version to the one you installed `v1alpha2` CRDs
61+
export VERSION="v0.3.0"
62+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/${VERSION}/manifests.yaml
63+
```
64+
65+
### 2. Install v1 Resources
66+
67+
After cleaning up the old resources, you can proceed with a fresh installation of the `v1` Inference Gateway.
68+
This involves deploying a new EPP image compatible with the `v1` API and installing the new `v1` CRDs.
69+
You can then create a new v1 InferencePool with its corresponding InferenceObjective resources, and a new HTTPRoute that directs traffic to your new `v1` InferencePool.
70+
71+
72+
### 3. Verify the Deployment
73+
74+
After a few minutes, verify that your new `v1` stack is correctly serving traffic. You should have a **`PROGRAMMED`** gateway.
75+
76+
```bash
77+
❯ kubectl get gateway -o wide
78+
NAME CLASS ADDRESS PROGRAMMED AGE
79+
<YOUR_INFERENCE_GATEWAY_NAME> inference-gateway <IP_ADDRESS> True 10m
80+
```
81+
82+
Curl the endpoint to make sure you are getting a successful response with a **200** response code.
83+
84+
```bash
85+
IP=$(kubectl get gateway/<YOUR_INFERENCE_GATEWAY_NAME> -o jsonpath='{.status.addresses[0].value}')
86+
PORT=80
87+
88+
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
89+
"model": "<your_model>",
90+
"prompt": "<your_prompt>",
91+
"max_tokens": 100,
92+
"temperature": 0
93+
}'
94+
```
95+
96+
***
97+
98+
## Zero-Downtime Migration
99+
100+
This migration path is designed for users who cannot afford any service interruption. Assuming you already have the following stack shown in the diagram
101+
102+
<img src="/images/alpha-stage.png" alt="Inference Gateway Alpha Stage" class="center" />
103+
104+
### A Note on Interacting with Multiple API Versions
105+
106+
During the zero-downtime migration, both `v1alpha2` and `v1` CRDs will be installed on your cluster. This can create ambiguity when using `kubectl` to query for `InferencePool` resources. To ensure you are interacting with the correct version, you **must** use the full resource name:
107+
108+
* **For v1alpha2**: `kubectl get inferencepools.inference.networking.x-k8s.io`
109+
* **For v1**: `kubectl get inferencepools.inference.networking.k8s.io`
110+
111+
The `v1` API also provides a convenient short name, `infpool`, which can be used to query `v1` resources specifically:
112+
113+
```bash
114+
kubectl get infpool
115+
```
116+
117+
This guide will use these full names or the short name for `v1` to avoid ambiguity.
118+
119+
***
120+
121+
### Stage 1: Side-by-side v1 Deployment
122+
123+
In this stage, you will deploy the new `v1` `InferencePool` stack alongside the existing `v1alpha2` stack. This allows for a safe, gradual migration.
124+
125+
After finishing all the steps in this stage, you’ll have the following infrastructure shown in the following diagram
126+
127+
<img src="/images/migration-stage.png" alt="Inference Gateway Migration Stage" class="center" />
128+
129+
**1. Install v1 CRDs**
130+
131+
```bash
132+
RELEASE=v1.0.0
133+
kubectl apply -f [https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml](https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml)
134+
```
135+
136+
**2. Install the v1 `InferencePool`**
137+
138+
Use Helm to install a new `v1` `InferencePool` with a distinct release name (e.g., `vllm-llama3-8b-instruct-ga`).
139+
140+
```bash
141+
helm install vllm-llama3-8b-instruct-ga \
142+
--set inferencePool.modelServers.matchLabels.app=<the_label_you_used_for_the_model_server_deployment> \
143+
--set provider.name=<YOUR_PROVIDER> \
144+
--version $RELEASE \
145+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
146+
```
147+
148+
**3. Create the v1 `InferenceObjective`**
149+
150+
The `v1` API replaces `InferenceModel` with `InferenceObjective`. Create the new resources, referencing the new `v1` `InferencePool`.
151+
152+
```yaml
153+
kubectl apply -f - <<EOF
154+
---
155+
apiVersion: inference.networking.x-k8s.io/v1alpha2
156+
kind: InferenceObjective
157+
metadata:
158+
name: food-review
159+
spec:
160+
priority: 1
161+
poolRef:
162+
group: inference.networking.k8s.io
163+
name: vllm-llama3-8b-instruct-ga
164+
---
165+
apiVersion: inference.networking.x-k8s.io/v1alpha2
166+
kind: InferenceObjective
167+
metadata:
168+
name: base-model
169+
spec:
170+
priority: 2
171+
poolRef:
172+
group: inference.networking.k8s.io
173+
name: vllm-llama3-8b-instruct-ga
174+
---
175+
EOF
176+
```
177+
178+
***
179+
180+
### Stage 2: Traffic Shifting
181+
182+
With both stacks running, you can start shifting traffic from `v1alpha2` to `v1` by updating the `HTTPRoute` to split traffic. This example shows a 50/50 split.
183+
184+
**1. Update `HTTPRoute` for Traffic Splitting**
185+
186+
```yaml
187+
kubectl apply -f - <<EOF
188+
---
189+
apiVersion: gateway.networking.k8s.io/v1
190+
kind: HTTPRoute
191+
metadata:
192+
name: llm-route
193+
spec:
194+
parentRefs:
195+
- group: gateway.networking.k8s.io
196+
kind: Gateway
197+
name: inference-gateway
198+
rules:
199+
- backendRefs:
200+
- group: inference.networking.x-k8s.io
201+
kind: InferencePool
202+
name: vllm-llama3-8b-instruct-alpha
203+
weight: 50
204+
- group: inference.networking.k8s.io
205+
kind: InferencePool
206+
name: vllm-llama3-8b-instruct-ga
207+
weight: 50
208+
---
209+
EOF
210+
```
211+
212+
**2. Verify and Monitor**
213+
214+
After applying the changes, monitor the performance and stability of the new `v1` stack. Make sure the `inference-gateway` status `PROGRAMMED` is `True`.
215+
216+
***
217+
218+
### Stage 3: Finalization and Cleanup
219+
220+
Once you have verified that the `v1` `InferencePool` is stable, you can direct all traffic to it and decommission the old `v1alpha2` resources.
221+
222+
**1. Shift 100% of Traffic to the v1 `InferencePool`**
223+
224+
Update the `HTTPRoute` to send all traffic to the `v1` pool.
225+
226+
```yaml
227+
kubectl apply -f - <<EOF
228+
apiVersion: gateway.networking.k8s.io/v1
229+
kind: HTTPRoute
230+
metadata:
231+
name: llm-route
232+
spec:
233+
parentRefs:
234+
- group: gateway.networking.k8s.io
235+
kind: Gateway
236+
name: inference-gateway
237+
rules:
238+
- backendRefs:
239+
- group: inference.networking.k8s.io
240+
kind: InferencePool
241+
name: vllm-llama3-8b-instruct-ga
242+
weight: 100
243+
EOF
244+
```
245+
246+
**2. Final Verification**
247+
248+
Send test requests to ensure your `v1` stack is handling all traffic as expected.
249+
250+
<img src="/images/ga-stage.png" alt="Inference Gateway GA Stage" class="center" />
251+
252+
You should have a **`PROGRAMMED`** gateway:
253+
```bash
254+
❯ kubectl get gateway -o wide
255+
NAME CLASS ADDRESS PROGRAMMED AGE
256+
inference-gateway inference-gateway <IP_ADDRESS> True 10m
257+
```
258+
259+
Curl the endpoint and verify a **200** response code:
260+
```bash
261+
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
262+
PORT=80
263+
264+
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
265+
"model": "<your_model>",
266+
"prompt": "<your_prompt>",
267+
"max_tokens": 100,
268+
"temperature": 0
269+
}'
270+
```
271+
272+
**3. Clean Up v1alpha2 Resources**
273+
274+
After confirming the `v1` stack is fully operational, safely remove the old `v1alpha2` resources.

site-src/images/alpha-stage.png

293 KB
Loading

site-src/images/ga-stage.png

310 KB
Loading
428 KB
Loading

0 commit comments

Comments
 (0)