Skip to content

Inference Extension: Support InferencePool backends in HTTPRoutes #3835

@sjberman

Description

@sjberman

When a user creates an HTTPRoute that references an InferencePool as a backend, NGF needs to configure NGINX with the proper server/location/upstream blocks to be able to route client traffic to the Pods associated with that InferencePool.

Initially, this story will just have NGINX proxy_pass to the upstream (our existing behavior), which will use our default load balancing method to determine the endpoint to send to. In followup stories, this behavior will change to use the endpoint returned by the Endpoint Picker (EPP) component, and instead use the upstream as a fallback if necessary and possible.

Acceptance Criteria:

  • NGF configures server/location/upstream blocks in NGINX for HTTPRoutes that reference an InferencePool as a backend instead of a Service
  • This feature is enabled/disabled via a feature flag (disabled by default)

Developer Notes:

  • To reuse existing Service/EndpointSlice logic to build upstreams, our control plane can probably create a headless "shadow" Service for each InferencePool that we see that is referenced by an HTTPRoute. By using label selectors, this Service will allow our control plane to find the endpoints for the InferencePool and process them as if it was a normal Service.
  • If this approach works, we also need to clean up that Service when it's no longer needed.
  • The controller needs to watch the InferencePool resources. If one gets deleted or its selectors change, then we need to delete or update the associated shadow service.

Design doc: https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/proposals/gateway-inference-extension.md

Metadata

Metadata

Assignees

Labels

area/inference-extensionRelated to the Gateway API Inference ExtensionenhancementNew feature or requestrefinedRequirements are refined and the issue is ready to be implemented.size/mediumEstimated to be completed within a week

Type

No type

Projects

Status

✅ Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions