BOOM API: some more guard rails#400
Conversation
Theodlz
commented
Feb 27, 2026
- require a limit (up to 100k) for boom api queries (cone_search, pipeline, find). No default here (could create lots of issues where folks get a number of results which is the default and don't know there is more data available, best way to tackle this issue is making sure they are aware of the pagination, by simply not letting a query through if a limit isn't specified).
- added a default 30s DB query timeout for all 3 endpoints (max_time_ms, already existed but if not specified we used no default).
- added a limit of 10k coordinates for the cone search endpoint
…ine, find); added a default 30s query timeout for all 3 endpoints; added a limit of 10k coordinates for the cone search endpoint
There was a problem hiding this comment.
Pull request overview
This PR adds guardrails to the BOOM query endpoints to enforce explicit pagination and reduce long-running database queries.
Changes:
- Make
limitrequired (and validate1..=100_000) forfind,pipeline, andcone_searchqueries. - Apply a default 30s
max_time_ms(MongoDBmax_time) when not provided. - Add a hard cap of 10,000 coordinate pairs for
cone_search.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
src/api/routes/queries/pipeline.rs |
Adds required limit, optional skip, default query timeout, and injects $skip/$limit stages into the aggregation pipeline. |
src/api/routes/queries/find.rs |
Makes limit required, validates bounds, and applies a default DB query timeout. |
src/api/routes/queries/cone_search.rs |
Makes limit required, validates bounds, applies a default DB query timeout, and limits the number of coordinate pairs. |
Comments suppressed due to low confidence (1)
src/api/routes/queries/cone_search.rs:73
limitis now required, but the existing API integration tests for/queries/cone_searchcurrently omitlimit(seetests/api/test_queries.rs::test_post_cone_search_query). This will cause test failures and break existing clients unless they are updated to pass a validlimit.
#[derive(serde::Deserialize, Clone, ToSchema)]
struct ConeSearchQuery {
catalog_name: String,
filter: Option<serde_json::Value>,
projection: Option<serde_json::Value>,
radius: f64,
unit: Unit,
object_coordinates: HashMap<String, [f64; 2]>, // Map of catalog name to coordinates [RA, Dec]
limit: u32,
skip: Option<u64>,
sort: Option<serde_json::Value>,
max_time_ms: Option<u64>,
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // add a skip stage to the pipeline if skip is set (must come before $limit) | ||
| if let Some(skip) = body.skip { | ||
| let skip_stage = doc! { "$skip": skip }; | ||
| pipeline.push(skip_stage); | ||
| } | ||
| // add a limit stage to the pipeline after validating that the limit is a positive integer < 100_000 | ||
| if body.limit == 0 || body.limit > 100_000 { | ||
| return response::bad_request( | ||
| "Limit must be a positive integer less than or equal to 100,000", | ||
| ); | ||
| } | ||
|
|
||
| let limit_stage = doc! { "$limit": body.limit }; | ||
| pipeline.push(limit_stage); |
There was a problem hiding this comment.
Appending $skip/$limit stages unconditionally can lead to incorrect pagination or confusing semantics when the user-supplied pipeline already contains $skip and/or $limit stages (e.g., client $limit at the end will be overridden by the new $limit, and a client $limit before the injected $skip changes behavior). Consider either rejecting pipelines containing these stages, or inserting/replacing them deterministically (e.g., remove existing $skip/$limit and append the enforced ones).
There was a problem hiding this comment.
worst case scenario, we apply a stricter limit and that's fine. Otherwise we have a looser limit and the one enforced by the user is stricter. I agree we could be more clever about this but I see no "problem" with the current implementation.
There was a problem hiding this comment.
what is true though, is we may want to start banning some stages that can easily cascade is super expensive queries: $lookups aren't great for instance.
| if object_coordinates.is_empty() || object_coordinates.len() > 10_000 { | ||
| return response::bad_request( | ||
| "Invalid number of coordinate pairs, must be between 1 and 10,000", | ||
| ); | ||
| } |
There was a problem hiding this comment.
Even with the new caps (10k coordinate pairs and limit up to 100k), this endpoint can still attempt to materialize a very large response in memory (Vec<Document> per coordinate, collected before responding). Consider adding a tighter guardrail on the combined result size (e.g., object_coordinates.len() * limit), lowering the max limit for cone search, and/or changing the response strategy (streaming/pagination per object) to reduce OOM risk.
There was a problem hiding this comment.
Yes I'm been thinking about this quite a bit... the conesearches in bulk are tricky! I think I'll open another PR for that one specifically, I still need to figure out what's an implementation that is safe for the server AND doesn't confuse the client
|
Throughput results (
|
|
Throughput results (
|
|
Throughput results (
|
petebachant
left a comment
There was a problem hiding this comment.
Some ideas for more scalable patterns but could always be put into a new issue for the future.
| // assert that limit is a positive integer < 100_000 | ||
| if self.limit == 0 || self.limit > 100_000 { | ||
| return Err( | ||
| "Limit must be a positive integer less than or equal to 100,000".to_string(), |
There was a problem hiding this comment.
Why not set a default limit as 100,000?
There was a problem hiding this comment.
because when pagination uses defaults, it means the user may get up to default and think we got all the data, if they don't know their is pagination (and my experience showed me they never ever THINK about it).
It's too prone to error. This forces them to be aware of it, because they have to use it.
| unit: Unit, | ||
| object_coordinates: HashMap<String, [f64; 2]>, // Map of catalog name to coordinates [RA, Dec] | ||
| limit: Option<i64>, | ||
| limit: u32, |
There was a problem hiding this comment.
I think you can do something like this for the docs:
| limit: u32, | |
| #[utoipa(schema(minimum = 1, maximum = 100_000))] | |
| limit: u32, |
There was a problem hiding this comment.
I think you can also do something like #[validate(range(min = 1, max = 100_000))] and call body.validate() in the endpoint function to catch validation errors. Might be nicer to declare on the structs rather than putting the logic in the functions.
There was a problem hiding this comment.
I used utoipa for that before. I like setting a min max in the schema for documentation. But for validation, I had terrible results in another project. The user gets these standardized but opaque deserialization error, and they don't understand what went wrong in most cases. I definitely want to use #[utoipa(schema(...))] everywhere in the API (might deserve its own ticket but let me give a go at it now), but the validation isn't a good idea in my opinion, as elegant as it sounds like from the developer's perspective.
|
Throughput results (
|