Skip to content

Commit b3d90eb

Browse files
committed
update
1 parent a8b406a commit b3d90eb

File tree

3 files changed

+31
-30
lines changed

3 files changed

+31
-30
lines changed

config/charts/inferencepool/README.md

Lines changed: 28 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -166,31 +166,32 @@ $ helm uninstall pool-1
166166

167167
The following table list the configurable parameters of the chart.
168168

169-
| **Parameter Name** | **Description** |
170-
|---------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
171-
| `inferencePool.apiVersion` | The API version of the InferencePool resource. Defaults to `inference.networking.k8s.io/v1`. This can be changed to `inference.networking.x-k8s.io/v1alpha2` to support older API versions. |
172-
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
173-
| `inferencePool.modelServerType` | Type of the model servers in the pool, valid options are [vllm, triton-tensorrt-llm], default is vllm. |
174-
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
175-
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. If More than one replica is used, EPP will run in HA active-passive mode. Defaults to `1`. |
176-
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |
177-
| `inferenceExtension.image.hub` | Registry URL where the endpoint picker image is hosted. |
178-
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |
179-
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
180-
| `inferenceExtension.env` | List of environment variables to set in the endpoint picker container as free-form YAML. Defaults to `[]`. |
181-
| `inferenceExtension.extraContainerPorts` | List of additional container ports to expose. Defaults to `[]`. |
182-
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. |
183-
| `inferenceExtension.flags` | List of flags which are passed through to endpoint picker. Example flags, enable-pprof, grpc-port etc. Refer [runner.go](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go) for complete list. |
184-
| `inferenceExtension.affinity` | Affinity for the endpoint picker. Defaults to `{}`. |
185-
| `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. | |
186-
| `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. |
187-
| `inferenceExtension.monitoring.secret.name` | Name of the service account token secret for metrics authentication. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
188-
| `inferenceExtension.monitoring.prometheus.enabled` | Enable Prometheus ServiceMonitor creation for EPP metrics collection. Defaults to `false`. |
189-
| `inferenceExtension.monitoring.gke.enabled` | Enable GKE monitoring resources (`PodMonitoring` and RBAC). Defaults to `false`. |
190-
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
191-
| `inferenceExtension.trace.enabled` | Enables or disables OpenTelemetry tracing globally for the EndpointPicker. |
192-
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: [`none`, `gke`, or `istio`]. Defaults to `none`. |
193-
| `provider.gke.autopilot` | Set to `true` if the cluster is a GKE Autopilot cluster. This is only used if `provider.name` is `gke`. Defaults to `false`. |
169+
| **Parameter Name** | **Description** |
170+
|----------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
171+
| `inferencePool.apiVersion` | The API version of the InferencePool resource. Defaults to `inference.networking.k8s.io/v1`. This can be changed to `inference.networking.x-k8s.io/v1alpha2` to support older API versions. |
172+
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
173+
| `inferencePool.modelServerType` | Type of the model servers in the pool, valid options are [vllm, triton-tensorrt-llm], default is vllm. |
174+
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
175+
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. If More than one replica is used, EPP will run in HA active-passive mode. Defaults to `1`. |
176+
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |
177+
| `inferenceExtension.image.hub` | Registry URL where the endpoint picker image is hosted. |
178+
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |
179+
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
180+
| `inferenceExtension.env` | List of environment variables to set in the endpoint picker container as free-form YAML. Defaults to `[]`. |
181+
| `inferenceExtension.extraContainerPorts` | List of additional container ports to expose. Defaults to `[]`. |
182+
| `inferenceExtension.extraServicePorts` | List of additional service ports to expose. Defaults to `[]`. |
183+
| `inferenceExtension.flags` | List of flags which are passed through to endpoint picker. Example flags, enable-pprof, grpc-port etc. Refer [runner.go](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/cmd/epp/runner/runner.go) for complete list. |
184+
| `inferenceExtension.affinity` | Affinity for the endpoint picker. Defaults to `{}`. |
185+
| `inferenceExtension.tolerations` | Tolerations for the endpoint picker. Defaults to `[]`. |
186+
| `inferenceExtension.monitoring.interval` | Metrics scraping interval for monitoring. Defaults to `10s`. |
187+
| `inferenceExtension.monitoring.secret.name` | Name of the service account token secret for metrics authentication. Defaults to `inference-gateway-sa-metrics-reader-secret`. |
188+
| `inferenceExtension.monitoring.prometheus.enabled` | Enable Prometheus ServiceMonitor creation for EPP metrics collection. Defaults to `false`. |
189+
| `inferenceExtension.monitoring.gke.enabled` | Enable GKE monitoring resources (`PodMonitoring` and RBAC). Defaults to `false`. |
190+
| `inferenceExtension.pluginsCustomConfig` | Custom config that is passed to EPP as inline yaml. |
191+
| `inferenceExtension.tracing.enabled` | Enables or disables OpenTelemetry tracing globally for the EndpointPicker. |
192+
| `inferenceExtension.tracing.otelExporterEndpoint` | OpenTelemetry collector endpoint. |
193+
| `provider.name` | Name of the Inference Gateway implementation being used. Possible values: [`none`, `gke`, or `istio`]. Defaults to `none`. |
194+
| `provider.gke.autopilot` | Set to `true` if the cluster is a GKE Autopilot cluster. This is only used if `provider.name` is `gke`. Defaults to `false`. |
194195

195196
### Provider Specific Configuration
196197

@@ -217,8 +218,8 @@ These are the options available to you with `provider.name` set to `istio`:
217218

218219
## OpenTelemetry
219220

220-
The EndpointPicker supports OpenTelemetry-based tracing. To enable it, use `--set inferenceExtension.trace.enabled=true`
221-
and configure the correct OpenTelemetry collector endpoint via the environment variable `OTEL_EXPORTER_OTLP_ENDPOINT` in `inferenceExtension.env`.
221+
The EndpointPicker supports OpenTelemetry-based tracing. To enable it, use `--set inferenceExtension.tracing.enabled=true`
222+
and configure the correct OpenTelemetry collector endpoint via `--set inferenceExtension.tracing.otelExporterEndpoint`.
222223

223224
## Notes
224225

config/charts/inferencepool/templates/epp-deployment.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ spec:
6363
- "{{ .value }}"
6464
{{- end }}
6565
- "--tracing"
66-
{{- if .Values.inferenceExtension.trace.enabled }}
66+
{{- if .Values.inferenceExtension.tracing.enabled }}
6767
- "true"
6868
{{- else }}
6969
- "false"
@@ -107,7 +107,7 @@ spec:
107107
valueFrom:
108108
fieldRef:
109109
fieldPath: metadata.namespace
110-
{{- if .Values.inferenceExtension.trace.enabled }}
110+
{{- if .Values.inferenceExtension.tracing.enabled }}
111111
- name: OTEL_SERVICE_NAME
112112
value: "gateway-api-inference-extension"
113113
- name: OTEL_EXPORTER_OTLP_ENDPOINT

config/charts/inferencepool/values.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ inferenceExtension:
5353

5454
gke:
5555
enabled: false
56-
trace:
56+
tracing:
5757
enabled: false
5858
otelExporterEndpoint: "http://localhost:4317"
5959
sampling:

0 commit comments

Comments
 (0)