-
Notifications
You must be signed in to change notification settings - Fork 173
Docs: added migration guide #1558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,270 @@ | ||
# Inference Gateway: Migrating from v1alpha2 to v1 API | ||
|
||
## Introduction | ||
|
||
This guide provides a comprehensive walkthrough for migrating your Inference Gateway setup from the alpha `v1alpha2` API to the generally available `v1` API. | ||
This document is intended for platform administrators and networking specialists | ||
who are currently using the `v1alpha2` version of the Inference Gateway and | ||
want to upgrade to the `v1` version to leverage the latest features and improvements. | ||
|
||
Before you start the migration, ensure you are familiar with the concepts and deployment of the Inference Gateway. | ||
|
||
*** | ||
|
||
## Before you begin | ||
|
||
Before starting the migration, it's important to determine if this guide is necessary for your setup. | ||
|
||
### Checking for Existing v1alpha2 APIs | ||
|
||
To check if you are actively using the `v1alpha2` Inference Gateway APIs, run the following command: | ||
|
||
```bash | ||
kubectl get inferencepools.inference.networking.x-k8s.io --all-namespaces | ||
``` | ||
|
||
* If this command returns one or more `InferencePool` resources, you are using the `v1alpha2` API and should proceed with this migration guide. | ||
* If the command returns `No resources found`, you are not using the `v1alpha2` `InferencePool` and do not need to follow this migration guide. You can proceed with a fresh installation of the `v1` Inference Gateway. | ||
|
||
*** | ||
|
||
## Migration Paths | ||
|
||
There are two paths for migrating from `v1alpha2` to `v1`: | ||
|
||
1. **Simple Migration (with downtime):** This path is for users who can afford a short period of downtime. It involves deleting the old `v1alpha2` resources and CRDs before installing the new `v1` versions. | ||
2. **Zero-Downtime Migration:** This path is for users who need to migrate without any service interruption. It involves running both `v1alpha2` and `v1` stacks side-by-side and gradually shifting traffic. | ||
|
||
*** | ||
|
||
## Simple Migration (with downtime) | ||
|
||
This approach is faster and simpler but will result in a brief period of downtime while the resources are being updated. It is the recommended path if you do not require a zero-downtime migration. | ||
|
||
### 1. Delete Existing v1alpha2 Resources | ||
|
||
**Option a: Uninstall using Helm.** | ||
|
||
```bash | ||
helm uninstall <helm_alpha_inferencepool_name> | ||
``` | ||
|
||
**Option b: Manually delete alpha `InferencePool` resources.** | ||
|
||
If you are not using Helm, you will need to manually delete all resources associated with your `v1alpha2` deployment. The key is to remove the `HTTPRoute`'s reference to the old `InferencePool` and then delete the `v1alpha2` resources themselves. | ||
|
||
1. **Update or Delete the `HTTPRoute`**: Modify the `HTTPRoute` to remove the `backendRef` that points to the `v1alpha2` `InferencePool`. | ||
2. **Delete the `InferencePool` and associated resources**: You must delete the `v1alpha2` `InferencePool`, any `InferenceModel` resources that point to it, and the corresponding Endpoint Picker (EPP) Deployment and Service. | ||
3. **Delete the `v1alpha2` CRDs**: Once all `v1alpha2` custom resources are deleted, you can remove the CRD definitions from your cluster. | ||
```bash | ||
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.3.0/manifests.yaml | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider making the version portion of the path configurable |
||
``` | ||
|
||
### 2. Install v1 Resources | ||
|
||
After cleaning up the old resources, you can proceed with a fresh installation of the `v1` Inference Gateway. This involves installing the new `v1` CRDs, creating a new `v1` `InferencePool` and corresponding `InferenceObjective` resources, and creating a new `HTTPRoute` that directs traffic to your new `v1` `InferencePool`. | ||
|
||
|
||
### 3. Verify the Deployment | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should probably include mention of the fact that you need to deploy a new EPP image that is compatible with the v1 API |
||
|
||
After a few minutes, verify that your new `v1` stack is correctly serving traffic. You should have a **`PROGRAMMED`** gateway. | ||
|
||
```bash | ||
❯ kubectl get gateway -o wide | ||
NAME CLASS ADDRESS PROGRAMMED AGE | ||
inference-gateway inference-gateway <IP_ADDRESS> True 10m | ||
``` | ||
|
||
Curl the endpoint to make sure you are getting a successful response with a **200** response code. | ||
|
||
```bash | ||
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think leaving the GW name as |
||
PORT=80 | ||
|
||
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{ | ||
"model": "<your_model>", | ||
"prompt": "<your_prompt>", | ||
"max_tokens": 100, | ||
"temperature": 0 | ||
}' | ||
``` | ||
|
||
*** | ||
|
||
## Zero-Downtime Migration | ||
|
||
This migration path is designed for users who cannot afford any service interruption. Assuming you already have the following stack shown in the diagram | ||
|
||
<img src="/images/alpha-stage.png" alt="Inference Gateway Alpha Stage" class="center" /> | ||
|
||
### A Note on Interacting with Multiple API Versions | ||
|
||
During the zero-downtime migration, both `v1alpha2` and `v1` CRDs will be installed on your cluster. This can create ambiguity when using `kubectl` to query for `InferencePool` resources. To ensure you are interacting with the correct version, you **must** use the full resource name: | ||
|
||
* **For v1alpha2**: `kubectl get inferencepools.inference.networking.x-k8s.io` | ||
* **For v1**: `kubectl get inferencepools.inference.networking.k8s.io` | ||
|
||
The `v1` API also provides a convenient short name, `infpool`, which can be used to query `v1` resources specifically: | ||
|
||
```bash | ||
kubectl get infpool | ||
``` | ||
|
||
This guide will use these full names or the short name for `v1` to avoid ambiguity. | ||
|
||
*** | ||
|
||
### Stage 1: Side-by-side v1 Deployment | ||
|
||
In this stage, you will deploy the new `v1` `InferencePool` stack alongside the existing `v1alpha2` stack. This allows for a safe, gradual migration. | ||
|
||
After finishing all the steps in this stage, you’ll have the following infrastructure shown in the following diagram | ||
|
||
<img src="/images/migration-stage.png" alt="Inference Gateway Migration Stage" class="center" /> | ||
|
||
**1. Install v1 CRDs** | ||
|
||
```bash | ||
RELEASE=v1.0.0 | ||
kubectl apply -f [https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml](https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml) | ||
``` | ||
|
||
**2. Install the v1 `InferencePool`** | ||
|
||
Use Helm to install a new `v1` `InferencePool` with a distinct release name (e.g., `vllm-llama3-8b-instruct-ga`). | ||
|
||
```bash | ||
helm install vllm-llama3-8b-instruct-ga \ | ||
--set inferencePool.modelServers.matchLabels.app=<the_label_you_used_for_the_model_server_deployment> \ | ||
--set provider.name=<YOUR_PROVIDER> \ | ||
--version $RELEASE \ | ||
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool | ||
``` | ||
|
||
**3. Create the v1 `InferenceObjective`** | ||
|
||
The `v1` API replaces `InferenceModel` with `InferenceObjective`. Create the new resources, referencing the new `v1` `InferencePool`. | ||
|
||
```yaml | ||
kubectl apply -f - <<EOF | ||
--- | ||
apiVersion: inference.networking.x-k8s.io/v1alpha2 | ||
kind: InferenceObjective | ||
metadata: | ||
name: food-review | ||
spec: | ||
priority: 1 | ||
poolRef: | ||
group: inference.networking.k8s.io | ||
name: vllm-llama3-8b-instruct-ga | ||
--- | ||
apiVersion: inference.networking.x-k8s.io/v1alpha2 | ||
kind: InferenceObjective | ||
metadata: | ||
name: base-model | ||
spec: | ||
priority: 2 | ||
poolRef: | ||
group: inference.networking.k8s.io | ||
name: vllm-llama3-8b-instruct-ga | ||
--- | ||
EOF | ||
``` | ||
|
||
*** | ||
|
||
### Stage 2: Traffic Shifting | ||
|
||
With both stacks running, you can start shifting traffic from `v1alpha2` to `v1` by updating the `HTTPRoute` to split traffic. This example shows a 50/50 split. | ||
|
||
**1. Update `HTTPRoute` for Traffic Splitting** | ||
|
||
```yaml | ||
kubectl apply -f - <<EOF | ||
--- | ||
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: HTTPRoute | ||
metadata: | ||
name: llm-route | ||
spec: | ||
parentRefs: | ||
- group: gateway.networking.k8s.io | ||
kind: Gateway | ||
name: inference-gateway | ||
rules: | ||
- backendRefs: | ||
- group: inference.networking.x-k8s.io | ||
kind: InferencePool | ||
name: vllm-llama3-8b-instruct-alpha | ||
weight: 50 | ||
- group: inference.networking.k8s.io | ||
kind: InferencePool | ||
name: vllm-llama3-8b-instruct-ga | ||
weight: 50 | ||
--- | ||
EOF | ||
``` | ||
|
||
**2. Verify and Monitor** | ||
|
||
After applying the changes, monitor the performance and stability of the new `v1` stack. Make sure the `inference-gateway` status `PROGRAMMED` is `True`. | ||
|
||
*** | ||
|
||
### Stage 3: Finalization and Cleanup | ||
|
||
Once you have verified that the `v1` `InferencePool` is stable, you can direct all traffic to it and decommission the old `v1alpha2` resources. | ||
|
||
**1. Shift 100% of Traffic to the v1 `InferencePool`** | ||
|
||
Update the `HTTPRoute` to send all traffic to the `v1` pool. | ||
|
||
```yaml | ||
kubectl apply -f - <<EOF | ||
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: HTTPRoute | ||
metadata: | ||
name: llm-route | ||
spec: | ||
parentRefs: | ||
- group: gateway.networking.k8s.io | ||
kind: Gateway | ||
name: inference-gateway | ||
rules: | ||
- backendRefs: | ||
- group: inference.networking.k8s.io | ||
kind: InferencePool | ||
name: vllm-llama3-8b-instruct-ga | ||
weight: 100 | ||
EOF | ||
``` | ||
|
||
**2. Final Verification** | ||
|
||
Send test requests to ensure your `v1` stack is handling all traffic as expected. | ||
|
||
<img src="/images/ga-stage.png" alt="Inference Gateway GA Stage" class="center" /> | ||
|
||
You should have a **`PROGRAMMED`** gateway: | ||
```bash | ||
❯ kubectl get gateway -o wide | ||
NAME CLASS ADDRESS PROGRAMMED AGE | ||
inference-gateway inference-gateway <IP_ADDRESS> True 10m | ||
``` | ||
|
||
Curl the endpoint and verify a **200** response code: | ||
```bash | ||
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}') | ||
PORT=80 | ||
|
||
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{ | ||
"model": "<your_model>", | ||
"prompt": "<your_prompt>", | ||
"max_tokens": 100, | ||
"temperature": 0 | ||
}' | ||
``` | ||
|
||
**3. Clean Up v1alpha2 Resources** | ||
|
||
After confirming the `v1` stack is fully operational, safely remove the old `v1alpha2` resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.