You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update inference gateway public docs to use helm charts (#1370)
* Update public docs to use helm charts
* Address review comments and restructure docs
* Update site-src/guides/index.md
Co-authored-by: Cong Liu <[email protected]>
* Update site-src/guides/index.md
Co-authored-by: Cong Liu <[email protected]>
* Update provider name to be a variable
Co-authored-by: Cong Liu <[email protected]>
* Remove gcp backend policy from docs
* Update PROVIDER_NAME to GATEWAY_PROVIDER
* Update Inferenceobjective in docs
---------
Co-authored-by: Cong Liu <[email protected]>
Deploy the sample InferenceModel which is configured to forward traffic to the `food-review-1`[LoRA adapter](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
Choose one of the following options to deploy an Inference Gateway.
@@ -98,20 +84,19 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
98
84
1. Enable the Gateway API and configure proxy-only subnets when necessary. See [Deploy Gateways](https://cloud.google.com/kubernetes-engine/docs/how-to/deploying-gateways)
@@ -123,13 +108,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
123
108
```bash
124
109
kubectl get httproute llm-route -o yaml
125
110
```
126
-
127
-
5. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -283,6 +262,31 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
283
262
kubectl get httproute llm-route -o yaml
284
263
```
285
264
265
+
266
+
### Deploy the InferencePool and Endpoint Picker Extension
267
+
268
+
Install an InferencePool named `vllm-llama3-8b-instruct` that selects from endpoints with label app: vllm-llama3-8b-instruct and listening on port 8000, you can run the following command:
269
+
270
+
```bash
271
+
export GATEWAY_PROVIDER=none # See [README](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/charts/inferencepool/README.md#configuration) for valid configurations
0 commit comments