Update inference gateway public docs to use helm charts (#1370)

rahulgurnani · liu-cong · web-flow · commit d5ef4cca83b1 · 2025-08-22T16:29:05.000-07:00
* Update public docs to use helm charts

* Address review comments and restructure docs

* Update site-src/guides/index.md

Co-authored-by: Cong Liu &lt;conliu@google.com&gt;

* Update site-src/guides/index.md

Co-authored-by: Cong Liu &lt;conliu@google.com&gt;

* Update provider name to be a variable

Co-authored-by: Cong Liu &lt;conliu@google.com&gt;

* Remove gcp backend policy from docs

* Update PROVIDER_NAME to GATEWAY_PROVIDER

* Update Inferenceobjective in docs

---------

Co-authored-by: Cong Liu &lt;conliu@google.com&gt;
diff --git a/site-src/guides/index.md b/site-src/guides/index.md
@@ -8,7 +8,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
 
 ## **Prerequisites**
 
-- A cluster with:
+A cluster with:
   - Support for services of type `LoadBalancer`. For kind clusters, follow [this guide](https://kind.sigs.k8s.io/docs/user/loadbalancer)
   to get services of type LoadBalancer working.
   - Support for [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) (enabled by default since Kubernetes v1.29)
@@ -75,20 +75,6 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
    kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/latest/download/manifests.yaml
    ```
 
-### Deploy InferenceModel
-
-   Deploy the sample InferenceModel which is configured to forward traffic to the `food-review-1` [LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
-
-   ```bash
-   kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.5.1/config/manifests/inferencemodel.yaml
-   ```
-
-### Deploy the InferencePool and Endpoint Picker Extension
-
-   ```bash
-   kubectl apply -f https://raw.githubusercontent.com/kubernetes-sigs/gateway-api-inference-extension/refs/tags/v0.5.1/config/manifests/inferencepool-resources.yaml
-   ```
-
 ### Deploy an Inference Gateway
 
    Choose one of the following options to deploy an Inference Gateway.
@@ -98,20 +84,19 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
       1. Enable the Gateway API and configure proxy-only subnets when necessary. See [Deploy Gateways](https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-gateways)
       for detailed instructions.
 
-      1. Deploy Gateway and HealthCheckPolicy resources
+      2. Deploy Inference Gateway:
 
          ```bash
          kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gateway.yaml
-         kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/healthcheck.yaml
          ```
 
          Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
+
          ```bash
          $ kubectl get gateway inference-gateway
          NAME                CLASS               ADDRESS         PROGRAMMED   AGE
          inference-gateway   inference-gateway   <MY_ADDRESS>    True         22s
          ```
-
       3. Deploy the HTTPRoute
 
          ```bash
@@ -123,13 +108,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
          ```bash
          kubectl get httproute llm-route -o yaml
          ```
-
-      5. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
-
-         ```bash
-         kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gcp-backend-policy.yaml
-         ```
-
+   
 === "Istio"
 
       Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -283,6 +262,31 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
          kubectl get httproute llm-route -o yaml
          ```
 
+
+### Deploy the InferencePool and Endpoint Picker Extension
+
+   Install an InferencePool named `vllm-llama3-8b-instruct` that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
+
+   ```bash
+   export GATEWAY_PROVIDER=none #  See [README](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/charts/inferencepool/README.md#configuration) for valid configurations
+   helm install vllm-llama3-8b-instruct \
+   --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
+   --set provider.name=$GATEWAY_PROVIDER \
+   --version v0.3.0 \
+   oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
+   ```
+
+   The Helm install automatically installs the endpoint-picker, inferencepool along with provider specific resources.
+
+### Deploy InferenceObjective (Optional)
+
+   Deploy the sample InferenceObjective which allows you to specify priority of requests.
+
+   ```bash
+   kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferenceobjective.yaml
+   ```
+
+
 ### Try it out
 
    Wait until the gateway is ready.