-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Prevent OOM from malformed snappy payloads by validating decoded length #1917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c5b4a57 to
18310e4
Compare
kakkoyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the contribution.
We need some documentation.
exp/api/remote/remote_api.go
Outdated
| } | ||
| } | ||
|
|
||
| var maxDecodedSize = 32 * 1024 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we document how did we come up with this number?
Is there another source of truth we need to be in sync with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the max remote-write size that VictoriaMetrics as const value. Users can override it via a flag -maxInsertRequestSize if needed. Prometheus previously had no limit, so enforcing 32 MB might break some setups.
Need some guidance here, should we raise or lower the value, or make it explicitly configurable?
I can take a look. Can you please point out where I can find the docs' sources? |
bwplotka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
We need some documentation.
I think @kakkoyun meant just a clear commentary.
We also need a unit test please, but thank you very much for adding this up 💪🏽 important!
Sorry, if it is not clear. Yes, that was what I meant. |
|
Added const description and mw tests. Please review |
kakkoyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for addressing the comments.
Could you add a draft entry to the CHANGELOG? We should mark this as a breaking change, I believe.
Also, you need to sign the DCO
A specially crafted remote-write request can declare an extremely large
decoded length while providing only a small encoded payload. Prometheus
allocates memory based on the declared decoded size, so a single request
can trigger an allocation of ~2.5 GB. A few such requests are enough to
crash the process with OOM.
Here's the script that can be used to reproduce the issue:
echo
"97eab4890a170a085f5f6e616d655f5f120b746573745f6d6574726963121009000000000000f03f10d48fc9b2a333"
\
| xxd -r -p \
| curl -X POST \
"http://127.0.0.1:9090/api/v1/write" \
-H "Content-Type: application/x-protobuf" \
-H "Content-Encoding: snappy" \
-H "X-Prometheus-Remote-Write-Version: 0.1.0" \
--data-binary @-
This change adds a hard limit: the requested decoded length must be less
than 32 MB. Requests exceeding the limit are rejected with HTTP 400
before any allocation occurs.
Signed-off-by: Max Kotliar <[email protected]>
caec660 to
84aaa71
Compare
|
Added changelog, rebased&squashed. |
|
@kakkoyun kind reminder |
A specially crafted remote-write request can declare an extremely large decoded length while providing only a small encoded payload. Prometheus allocates memory based on the declared decoded size, so a single request can trigger an allocation of ~2.5 GB. A few such requests are enough to crash the process with OOM.
Here's the script that can be used to reproduce the issue:
This change adds a hard limit: the requested decoded length must be less than 32 MB. Requests exceeding the limit are rejected with HTTP 400 before any allocation occurs.