Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ nav:
- Configuration Guide:
- Configuring the plugins via configuration files or text: guides/epp-configuration/config-text.md
- Prefix Cache Aware Plugin: guides/epp-configuration/prefix-aware.md
- Migration Guide: guides/ga-migration.md
- Troubleshooting Guide: guides/troubleshooting.md
- Implementer Guides:
- Getting started: guides/implementers.md
Expand Down
270 changes: 270 additions & 0 deletions site-src/guides/ga-migration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,270 @@
# Inference Gateway: Migrating from v1alpha2 to v1 API

## Introduction

This guide provides a comprehensive walkthrough for migrating your Inference Gateway setup from the alpha `v1alpha2` API to the generally available `v1` API.
This document is intended for platform administrators and networking specialists
who are currently using the `v1alpha2` version of the Inference Gateway and
want to upgrade to the `v1` version to leverage the latest features and improvements.

Before you start the migration, ensure you are familiar with the concepts and deployment of the Inference Gateway.

***

## Before you begin

Before starting the migration, it's important to determine if this guide is necessary for your setup.

### Checking for Existing v1alpha2 APIs

To check if you are actively using the `v1alpha2` Inference Gateway APIs, run the following command:

```bash
kubectl get inferencepools.inference.networking.x-k8s.io --all-namespaces
```

* If this command returns one or more `InferencePool` resources, you are using the `v1alpha2` API and should proceed with this migration guide.
* If the command returns `No resources found`, you are not using the `v1alpha2` `InferencePool` and do not need to follow this migration guide. You can proceed with a fresh installation of the `v1` Inference Gateway.

***

## Migration Paths

There are two paths for migrating from `v1alpha2` to `v1`:

1. **Simple Migration (with downtime):** This path is for users who can afford a short period of downtime. It involves deleting the old `v1alpha2` resources and CRDs before installing the new `v1` versions.
2. **Zero-Downtime Migration:** This path is for users who need to migrate without any service interruption. It involves running both `v1alpha2` and `v1` stacks side-by-side and gradually shifting traffic.

***

## Simple Migration (with downtime)

This approach is faster and simpler but will result in a brief period of downtime while the resources are being updated. It is the recommended path if you do not require a zero-downtime migration.

### 1. Delete Existing v1alpha2 Resources

**Option a: Uninstall using Helm.**

```bash
helm uninstall <helm_alpha_inferencepool_name>
```

**Option b: Manually delete alpha `InferencePool` resources.**

If you are not using Helm, you will need to manually delete all resources associated with your `v1alpha2` deployment. The key is to remove the `HTTPRoute`'s reference to the old `InferencePool` and then delete the `v1alpha2` resources themselves.

1. **Update or Delete the `HTTPRoute`**: Modify the `HTTPRoute` to remove the `backendRef` that points to the `v1alpha2` `InferencePool`.
2. **Delete the `InferencePool` and associated resources**: You must delete the `v1alpha2` `InferencePool`, any `InferenceModel` resources that point to it, and the corresponding Endpoint Picker (EPP) Deployment and Service.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. **Delete the `InferencePool` and associated resources**: You must delete the `v1alpha2` `InferencePool`, any `InferenceModel` resources that point to it, and the corresponding Endpoint Picker (EPP) Deployment and Service.
2. **Delete the `InferencePool` and associated resources**: You must delete the `v1alpha2` `InferencePool`, any `InferenceModel` (or 'InferenceObjective') resources that point to it, and the corresponding Endpoint Picker (EPP) Deployment and Service.

3. **Delete the `v1alpha2` CRDs**: Once all `v1alpha2` custom resources are deleted, you can remove the CRD definitions from your cluster.
```bash
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.3.0/manifests.yaml
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider making the version portion of the path configurable

```

### 2. Install v1 Resources

After cleaning up the old resources, you can proceed with a fresh installation of the `v1` Inference Gateway. This involves installing the new `v1` CRDs, creating a new `v1` `InferencePool` and corresponding `InferenceObjective` resources, and creating a new `HTTPRoute` that directs traffic to your new `v1` `InferencePool`.


### 3. Verify the Deployment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably include mention of the fact that you need to deploy a new EPP image that is compatible with the v1 API


After a few minutes, verify that your new `v1` stack is correctly serving traffic. You should have a **`PROGRAMMED`** gateway.

```bash
❯ kubectl get gateway -o wide
NAME CLASS ADDRESS PROGRAMMED AGE
inference-gateway inference-gateway <IP_ADDRESS> True 10m
```

Curl the endpoint to make sure you are getting a successful response with a **200** response code.

```bash
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think leaving the GW name as inference-gateway is fine in this case, but I would make mention that you need to put your GW name here

PORT=80

curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "<your_model>",
"prompt": "<your_prompt>",
"max_tokens": 100,
"temperature": 0
}'
```

***

## Zero-Downtime Migration

This migration path is designed for users who cannot afford any service interruption. Assuming you already have the following stack shown in the diagram

<img src="/images/alpha-stage.png" alt="Inference Gateway Alpha Stage" class="center" />

### A Note on Interacting with Multiple API Versions

During the zero-downtime migration, both `v1alpha2` and `v1` CRDs will be installed on your cluster. This can create ambiguity when using `kubectl` to query for `InferencePool` resources. To ensure you are interacting with the correct version, you **must** use the full resource name:

* **For v1alpha2**: `kubectl get inferencepools.inference.networking.x-k8s.io`
* **For v1**: `kubectl get inferencepools.inference.networking.k8s.io`

The `v1` API also provides a convenient short name, `infpool`, which can be used to query `v1` resources specifically:

```bash
kubectl get infpool
```

This guide will use these full names or the short name for `v1` to avoid ambiguity.

***

### Stage 1: Side-by-side v1 Deployment

In this stage, you will deploy the new `v1` `InferencePool` stack alongside the existing `v1alpha2` stack. This allows for a safe, gradual migration.

After finishing all the steps in this stage, you’ll have the following infrastructure shown in the following diagram

<img src="/images/migration-stage.png" alt="Inference Gateway Migration Stage" class="center" />

**1. Install v1 CRDs**

```bash
RELEASE=v1.0.0
kubectl apply -f [https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml](https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/$RELEASE/config/crd/bases/inference.networking.x-k8s.io_inferenceobjectives.yaml)
```

**2. Install the v1 `InferencePool`**

Use Helm to install a new `v1` `InferencePool` with a distinct release name (e.g., `vllm-llama3-8b-instruct-ga`).

```bash
helm install vllm-llama3-8b-instruct-ga \
--set inferencePool.modelServers.matchLabels.app=<the_label_you_used_for_the_model_server_deployment> \
--set provider.name=<YOUR_PROVIDER> \
--version $RELEASE \
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
```

**3. Create the v1 `InferenceObjective`**

The `v1` API replaces `InferenceModel` with `InferenceObjective`. Create the new resources, referencing the new `v1` `InferencePool`.

```yaml
kubectl apply -f - <<EOF
---
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceObjective
metadata:
name: food-review
spec:
priority: 1
poolRef:
group: inference.networking.k8s.io
name: vllm-llama3-8b-instruct-ga
---
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceObjective
metadata:
name: base-model
spec:
priority: 2
poolRef:
group: inference.networking.k8s.io
name: vllm-llama3-8b-instruct-ga
---
EOF
```

***

### Stage 2: Traffic Shifting

With both stacks running, you can start shifting traffic from `v1alpha2` to `v1` by updating the `HTTPRoute` to split traffic. This example shows a 50/50 split.

**1. Update `HTTPRoute` for Traffic Splitting**

```yaml
kubectl apply -f - <<EOF
---
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: llm-route
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
rules:
- backendRefs:
- group: inference.networking.x-k8s.io
kind: InferencePool
name: vllm-llama3-8b-instruct-alpha
weight: 50
- group: inference.networking.k8s.io
kind: InferencePool
name: vllm-llama3-8b-instruct-ga
weight: 50
---
EOF
```

**2. Verify and Monitor**

After applying the changes, monitor the performance and stability of the new `v1` stack. Make sure the `inference-gateway` status `PROGRAMMED` is `True`.

***

### Stage 3: Finalization and Cleanup

Once you have verified that the `v1` `InferencePool` is stable, you can direct all traffic to it and decommission the old `v1alpha2` resources.

**1. Shift 100% of Traffic to the v1 `InferencePool`**

Update the `HTTPRoute` to send all traffic to the `v1` pool.

```yaml
kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: llm-route
spec:
parentRefs:
- group: gateway.networking.k8s.io
kind: Gateway
name: inference-gateway
rules:
- backendRefs:
- group: inference.networking.k8s.io
kind: InferencePool
name: vllm-llama3-8b-instruct-ga
weight: 100
EOF
```

**2. Final Verification**

Send test requests to ensure your `v1` stack is handling all traffic as expected.

<img src="/images/ga-stage.png" alt="Inference Gateway GA Stage" class="center" />

You should have a **`PROGRAMMED`** gateway:
```bash
❯ kubectl get gateway -o wide
NAME CLASS ADDRESS PROGRAMMED AGE
inference-gateway inference-gateway <IP_ADDRESS> True 10m
```

Curl the endpoint and verify a **200** response code:
```bash
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=80

curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "<your_model>",
"prompt": "<your_prompt>",
"max_tokens": 100,
"temperature": 0
}'
```

**3. Clean Up v1alpha2 Resources**

After confirming the `v1` stack is fully operational, safely remove the old `v1alpha2` resources.
Binary file added site-src/images/alpha-stage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added site-src/images/ga-stage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added site-src/images/migration-stage.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.