You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| `inferencePool.apiVersion` | The API version of the InferencePool resource. Defaults to `inference.networking.k8s.io/v1`. This can be changed to `inference.networking.x-k8s.io/v1alpha2` to support older API versions. |
172
-
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
173
-
| `inferencePool.modelServerType` | Type of the model servers in the pool, valid options are [vllm, triton-tensorrt-llm], default is vllm. |
174
-
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
175
-
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. If More than one replica is used, EPP will run in HA active-passive mode. Defaults to `1`. |
176
-
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |
177
-
| `inferenceExtension.image.hub` | Registry URL where the endpoint picker image is hosted. |
178
-
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |
179
-
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
180
-
| `inferenceExtension.env` | List of environment variables to set in the endpoint picker container as free-form YAML. Defaults to `[]`. |
181
-
| `inferenceExtension.extraContainerPorts` | List of additional container ports to expose. Defaults to `[]`. |
182
-
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. |
183
-
| `inferenceExtension.flags` | List of flags which are passed through to endpoint picker. Example flags, enable-pprof, grpc-port etc. Refer [runner.go](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go) for complete list. |
184
-
| `inferenceExtension.affinity` | Affinity for the endpoint picker. Defaults to `{}`. |
185
-
| `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. | |
186
-
| `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. |
187
-
| `inferenceExtension.monitoring.secret.name` | Name of the service account token secret for metrics authentication. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
188
-
| `inferenceExtension.monitoring.prometheus.enabled` | Enable Prometheus ServiceMonitor creation for EPP metrics collection. Defaults to `false`. |
189
-
| `inferenceExtension.monitoring.gke.enabled` | Enable GKE monitoring resources (`PodMonitoring` and RBAC). Defaults to `false`. |
190
-
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
191
-
| `inferenceExtension.trace.enabled` | Enables or disables OpenTelemetry tracing globally for the EndpointPicker. |
192
-
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: [`none`, `gke`, or `istio`]. Defaults to `none`. |
193
-
| `provider.gke.autopilot` | Set to `true` if the cluster is a GKE Autopilot cluster. This is only used if `provider.name` is `gke`. Defaults to `false`. |
| `inferencePool.apiVersion` | The API version of the InferencePool resource. Defaults to `inference.networking.k8s.io/v1`. This can be changed to `inference.networking.x-k8s.io/v1alpha2` to support older API versions. |
172
+
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
173
+
| `inferencePool.modelServerType` | Type of the model servers in the pool, valid options are [vllm, triton-tensorrt-llm], default is vllm. |
174
+
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
175
+
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. If More than one replica is used, EPP will run in HA active-passive mode. Defaults to `1`. |
176
+
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |
177
+
| `inferenceExtension.image.hub` | Registry URL where the endpoint picker image is hosted. |
178
+
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |
179
+
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
180
+
| `inferenceExtension.env` | List of environment variables to set in the endpoint picker container as free-form YAML. Defaults to `[]`. |
181
+
| `inferenceExtension.extraContainerPorts` | List of additional container ports to expose. Defaults to `[]`. |
182
+
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. |
183
+
| `inferenceExtension.flags` | List of flags which are passed through to endpoint picker. Example flags, enable-pprof, grpc-port etc. Refer [runner.go](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go) for complete list. |
184
+
| `inferenceExtension.affinity` | Affinity for the endpoint picker. Defaults to `{}`. |
185
+
| `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. |
186
+
| `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. |
187
+
| `inferenceExtension.monitoring.secret.name` | Name of the service account token secret for metrics authentication. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
188
+
| `inferenceExtension.monitoring.prometheus.enabled` | Enable Prometheus ServiceMonitor creation for EPP metrics collection. Defaults to `false`. |
189
+
| `inferenceExtension.monitoring.gke.enabled` | Enable GKE monitoring resources (`PodMonitoring` and RBAC). Defaults to `false`. |
190
+
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
191
+
| `inferenceExtension.tracing.enabled` | Enables or disables OpenTelemetry tracing globally for the EndpointPicker. |
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: [`none`, `gke`, or `istio`]. Defaults to `none`. |
194
+
| `provider.gke.autopilot` | Set to `true` if the cluster is a GKE Autopilot cluster. This is only used if `provider.name` is `gke`. Defaults to `false`. |
194
195
195
196
### Provider Specific Configuration
196
197
@@ -217,8 +218,8 @@ These are the options available to you with `provider.name` set to `istio`:
217
218
218
219
## OpenTelemetry
219
220
220
-
The EndpointPicker supports OpenTelemetry-based tracing. To enable it, use `--set inferenceExtension.trace.enabled=true`
221
-
and configure the correct OpenTelemetry collector endpoint via the environment variable `OTEL_EXPORTER_OTLP_ENDPOINT` in `inferenceExtension.env`.
221
+
The EndpointPicker supports OpenTelemetry-based tracing. To enable it, use `--set inferenceExtension.tracing.enabled=true`
222
+
and configure the correct OpenTelemetry collector endpoint via `--set inferenceExtension.tracing.otelExporterEndpoint`.
0 commit comments