Skip to content

BOOM API: some more guard rails#400

Open
Theodlz wants to merge 4 commits into
mainfrom
boom-api-guardrails
Open

BOOM API: some more guard rails#400
Theodlz wants to merge 4 commits into
mainfrom
boom-api-guardrails

Conversation

@Theodlz

@Theodlz Theodlz commented Feb 27, 2026

Copy link
Copy Markdown
Collaborator
  • require a limit (up to 100k) for boom api queries (cone_search, pipeline, find). No default here (could create lots of issues where folks get a number of results which is the default and don't know there is more data available, best way to tackle this issue is making sure they are aware of the pagination, by simply not letting a query through if a limit isn't specified).
  • added a default 30s DB query timeout for all 3 endpoints (max_time_ms, already existed but if not specified we used no default).
  • added a limit of 10k coordinates for the cone search endpoint

…ine, find); added a default 30s query timeout for all 3 endpoints; added a limit of 10k coordinates for the cone search endpoint
Copilot AI review requested due to automatic review settings February 27, 2026 11:30

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds guardrails to the BOOM query endpoints to enforce explicit pagination and reduce long-running database queries.

Changes:

  • Make limit required (and validate 1..=100_000) for find, pipeline, and cone_search queries.
  • Apply a default 30s max_time_ms (MongoDB max_time) when not provided.
  • Add a hard cap of 10,000 coordinate pairs for cone_search.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
src/api/routes/queries/pipeline.rs Adds required limit, optional skip, default query timeout, and injects $skip/$limit stages into the aggregation pipeline.
src/api/routes/queries/find.rs Makes limit required, validates bounds, and applies a default DB query timeout.
src/api/routes/queries/cone_search.rs Makes limit required, validates bounds, applies a default DB query timeout, and limits the number of coordinate pairs.
Comments suppressed due to low confidence (1)

src/api/routes/queries/cone_search.rs:73

  • limit is now required, but the existing API integration tests for /queries/cone_search currently omit limit (see tests/api/test_queries.rs::test_post_cone_search_query). This will cause test failures and break existing clients unless they are updated to pass a valid limit.
#[derive(serde::Deserialize, Clone, ToSchema)]
struct ConeSearchQuery {
    catalog_name: String,
    filter: Option<serde_json::Value>,
    projection: Option<serde_json::Value>,
    radius: f64,
    unit: Unit,
    object_coordinates: HashMap<String, [f64; 2]>, // Map of catalog name to coordinates [RA, Dec]
    limit: u32,
    skip: Option<u64>,
    sort: Option<serde_json::Value>,
    max_time_ms: Option<u64>,
}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/api/routes/queries/pipeline.rs Outdated
Comment on lines +63 to +76
// add a skip stage to the pipeline if skip is set (must come before $limit)
if let Some(skip) = body.skip {
let skip_stage = doc! { "$skip": skip };
pipeline.push(skip_stage);
}
// add a limit stage to the pipeline after validating that the limit is a positive integer < 100_000
if body.limit == 0 || body.limit > 100_000 {
return response::bad_request(
"Limit must be a positive integer less than or equal to 100,000",
);
}

let limit_stage = doc! { "$limit": body.limit };
pipeline.push(limit_stage);

Copilot AI Feb 27, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Appending $skip/$limit stages unconditionally can lead to incorrect pagination or confusing semantics when the user-supplied pipeline already contains $skip and/or $limit stages (e.g., client $limit at the end will be overridden by the new $limit, and a client $limit before the injected $skip changes behavior). Consider either rejecting pipelines containing these stages, or inserting/replacing them deterministically (e.g., remove existing $skip/$limit and append the enforced ones).

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worst case scenario, we apply a stricter limit and that's fine. Otherwise we have a looser limit and the one enforced by the user is stricter. I agree we could be more clever about this but I see no "problem" with the current implementation.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is true though, is we may want to start banning some stages that can easily cascade is super expensive queries: $lookups aren't great for instance.

Comment thread src/api/routes/queries/pipeline.rs
Comment thread src/api/routes/queries/find.rs
Comment on lines +155 to +159
if object_coordinates.is_empty() || object_coordinates.len() > 10_000 {
return response::bad_request(
"Invalid number of coordinate pairs, must be between 1 and 10,000",
);
}

Copilot AI Feb 27, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even with the new caps (10k coordinate pairs and limit up to 100k), this endpoint can still attempt to materialize a very large response in memory (Vec<Document> per coordinate, collected before responding). Consider adding a tighter guardrail on the combined result size (e.g., object_coordinates.len() * limit), lowering the max limit for cone search, and/or changing the response strategy (streaming/pagination per object) to reduce OOM risk.

Copilot uses AI. Check for mistakes.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I'm been thinking about this quite a bit... the conesearches in bulk are tricky! I think I'll open another PR for that one specifically, I still need to figure out what's an implementation that is safe for the server AND doesn't confuse the client

Comment thread src/api/routes/queries/pipeline.rs
Comment thread src/api/routes/queries/pipeline.rs
Comment thread src/api/routes/queries/find.rs
@github-actions

Copy link
Copy Markdown

Throughput results (882e3457165efb53428a8a2f6b21d552d096fb8f):

New wall time Baseline wall time Difference
183.7 184.8 0%

@Theodlz Theodlz changed the title BOOM API: some more guard railes BOOM API: some more guard rails Feb 27, 2026
@github-actions

Copy link
Copy Markdown

Throughput results (271e471daac68afd73e81a67452bc5ed5bd7d1ea):

New wall time Baseline wall time Difference
177.9 180.2 -1.00%

@github-actions

Copy link
Copy Markdown

Throughput results (bff328f80e3d37d7554bf92a3448e09f6aeeda18):

New wall time Baseline wall time Difference
189.4 180.2 5.00%

@petebachant petebachant left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some ideas for more scalable patterns but could always be put into a new issue for the future.

// assert that limit is a positive integer < 100_000
if self.limit == 0 || self.limit > 100_000 {
return Err(
"Limit must be a positive integer less than or equal to 100,000".to_string(),

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not set a default limit as 100,000?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because when pagination uses defaults, it means the user may get up to default and think we got all the data, if they don't know their is pagination (and my experience showed me they never ever THINK about it).

It's too prone to error. This forces them to be aware of it, because they have to use it.

unit: Unit,
object_coordinates: HashMap<String, [f64; 2]>, // Map of catalog name to coordinates [RA, Dec]
limit: Option<i64>,
limit: u32,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do something like this for the docs:

Suggested change
limit: u32,
#[utoipa(schema(minimum = 1, maximum = 100_000))]
limit: u32,

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can also do something like #[validate(range(min = 1, max = 100_000))] and call body.validate() in the endpoint function to catch validation errors. Might be nicer to declare on the structs rather than putting the logic in the functions.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used utoipa for that before. I like setting a min max in the schema for documentation. But for validation, I had terrible results in another project. The user gets these standardized but opaque deserialization error, and they don't understand what went wrong in most cases. I definitely want to use #[utoipa(schema(...))] everywhere in the API (might deserve its own ticket but let me give a go at it now), but the validation isn't a good idea in my opinion, as elegant as it sounds like from the developer's perspective.

@github-actions

Copy link
Copy Markdown

Throughput results (e5c574d28dba4a05418074854f97353075caafbd):

New wall time Baseline wall time Difference
183.9 177.9 3.00%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants