-
Notifications
You must be signed in to change notification settings - Fork 139
Closed
Labels
area/inference-extensionRelated to the Gateway API Inference ExtensionRelated to the Gateway API Inference ExtensionenhancementNew feature or requestNew feature or requestrefinedRequirements are refined and the issue is ready to be implemented.Requirements are refined and the issue is ready to be implemented.size/mediumEstimated to be completed within a weekEstimated to be completed within a week
Milestone
Description
When a user creates an HTTPRoute that references an InferencePool as a backend, NGF needs to configure NGINX with the proper server/location/upstream blocks to be able to route client traffic to the Pods associated with that InferencePool.
Initially, this story will just have NGINX proxy_pass
to the upstream (our existing behavior), which will use our default load balancing method to determine the endpoint to send to. In followup stories, this behavior will change to use the endpoint returned by the Endpoint Picker (EPP) component, and instead use the upstream as a fallback if necessary and possible.
Acceptance Criteria:
- NGF configures server/location/upstream blocks in NGINX for HTTPRoutes that reference an InferencePool as a backend instead of a Service
- This feature is enabled/disabled via a feature flag (disabled by default)
Developer Notes:
- To reuse existing Service/EndpointSlice logic to build upstreams, our control plane can probably create a headless "shadow" Service for each InferencePool that we see that is referenced by an HTTPRoute. By using label selectors, this Service will allow our control plane to find the endpoints for the InferencePool and process them as if it was a normal Service.
- If this approach works, we also need to clean up that Service when it's no longer needed.
- The controller needs to watch the InferencePool resources. If one gets deleted or its selectors change, then we need to delete or update the associated shadow service.
Design doc: https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/proposals/gateway-inference-extension.md
Metadata
Metadata
Assignees
Labels
area/inference-extensionRelated to the Gateway API Inference ExtensionRelated to the Gateway API Inference ExtensionenhancementNew feature or requestNew feature or requestrefinedRequirements are refined and the issue is ready to be implemented.Requirements are refined and the issue is ready to be implemented.size/mediumEstimated to be completed within a weekEstimated to be completed within a week
Type
Projects
Status
✅ Done