Skip to content

Commit d5ef4cc

Browse files
Update inference gateway public docs to use helm charts (#1370)
* Update public docs to use helm charts * Address review comments and restructure docs * Update site-src/guides/index.md Co-authored-by: Cong Liu <[email protected]> * Update site-src/guides/index.md Co-authored-by: Cong Liu <[email protected]> * Update provider name to be a variable Co-authored-by: Cong Liu <[email protected]> * Remove gcp backend policy from docs * Update PROVIDER_NAME to GATEWAY_PROVIDER * Update Inferenceobjective in docs --------- Co-authored-by: Cong Liu <[email protected]>
1 parent 73097c1 commit d5ef4cc

File tree

1 file changed

+29
-25
lines changed

1 file changed

+29
-25
lines changed

site-src/guides/index.md

Lines changed: 29 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
88

99
## **Prerequisites**
1010

11-
- A cluster with:
11+
A cluster with:
1212
- Support for services of type `LoadBalancer`. For kind clusters, follow [this guide](https://kind.sigs.k8s.io/docs/user/loadbalancer)
1313
to get services of type LoadBalancer working.
1414
- Support for [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) (enabled by default since Kubernetes v1.29)
@@ -75,20 +75,6 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
7575
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
7676
```
7777

78-
### Deploy InferenceModel
79-
80-
Deploy the sample InferenceModel which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
81-
82-
```bash
83-
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.5.1/config/manifests/inferencemodel.yaml
84-
```
85-
86-
### Deploy the InferencePool and Endpoint Picker Extension
87-
88-
```bash
89-
kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.5.1/config/manifests/inferencepool-resources.yaml
90-
```
91-
9278
### Deploy an Inference Gateway
9379

9480
Choose one of the following options to deploy an Inference Gateway.
@@ -98,20 +84,19 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
9884
1. Enable the Gateway API and configure proxy-only subnets when necessary. See [Deploy Gateways](https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-gateways)
9985
for detailed instructions.
10086

101-
1. Deploy Gateway and HealthCheckPolicy resources
87+
2. Deploy Inference Gateway:
10288

10389
```bash
10490
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gateway.yaml
105-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/healthcheck.yaml
10691
```
10792

10893
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
94+
10995
```bash
11096
$ kubectl get gateway inference-gateway
11197
NAME CLASS ADDRESS PROGRAMMED AGE
11298
inference-gateway inference-gateway <MY_ADDRESS> True 22s
11399
```
114-
115100
3. Deploy the HTTPRoute
116101

117102
```bash
@@ -123,13 +108,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
123108
```bash
124109
kubectl get httproute llm-route -o yaml
125110
```
126-
127-
5. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
128-
129-
```bash
130-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gcp-backend-policy.yaml
131-
```
132-
111+
133112
=== "Istio"
134113

135114
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -283,6 +262,31 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
283262
kubectl get httproute llm-route -o yaml
284263
```
285264

265+
266+
### Deploy the InferencePool and Endpoint Picker Extension
267+
268+
Install an InferencePool named `vllm-llama3-8b-instruct` that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
269+
270+
```bash
271+
export GATEWAY_PROVIDER=none # See [README](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/charts/inferencepool/README.md#configuration) for valid configurations
272+
helm install vllm-llama3-8b-instruct \
273+
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
274+
--set provider.name=$GATEWAY_PROVIDER \
275+
--version v0.3.0 \
276+
oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
277+
```
278+
279+
The Helm install automatically installs the endpoint-picker, inferencepool along with provider specific resources.
280+
281+
### Deploy InferenceObjective (Optional)
282+
283+
Deploy the sample InferenceObjective which allows you to specify priority of requests.
284+
285+
```bash
286+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferenceobjective.yaml
287+
```
288+
289+
286290
### Try it out
287291

288292
Wait until the gateway is ready.

0 commit comments

Comments
 (0)