Skip to content

Conversation

@shawnwangnih
Copy link

No description provided.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR increases Elasticsearch/OpenSearch query/result size limits and introduces a scroll-based pagination path for large result sets. It also updates YAML loading to use LoaderOptions and adjusts an OpenSearch import for newer versions.

  • Added scrolling logic (collectPageWithScroll, rollToPage, collectScrollPage) with new constants for thresholds and sizes
  • Raised multiple hard limits (MAX_ES_SIZE, MAX_SIZE, AGGS_SIZE, maxValues) to 200000
  • Added LoaderOptions to SnakeYAML constructors and updated an OpenSearch Text import

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
service/ESService.java Introduces scroll pagination, new constants, and helper methods for large result retrieval; adjusts scroll timeout.
model/search/yaml/type/SingleTypeYaml.java Adds LoaderOptions to SnakeYAML constructor.
model/search/yaml/type/GroupTypeYaml.java Adds LoaderOptions to SnakeYAML constructor.
model/search/yaml/type/GlobalTypeYaml.java Adds LoaderOptions to SnakeYAML constructor.
model/search/mapper/TypeMapperImpl.java Updates Text import to new OpenSearch package path.
controller/GraphQLController.java Increases maximum accepted variable list size.
constants/Const.java Raises MAX_SIZE and AGGS_SIZE limits.
ServletInitializer.java Removes redundant servlet initializer class.
BentoApplication.java Adds (unused) memory management imports.

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +34 to +36
public static final int MAX_ES_SIZE = 200000; // Do not return more than this number of records
public static final int SCROLL_THRESHOLD = 10000; // Use scroll when trying to retrieve past this number of records
public static final int SCROLL_SIZE = 10000; // How big each scroll should be
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Raising MAX_ES_SIZE to 200000 significantly increases worst-case memory and transfer costs; consider enforcing a tighter upper bound (e.g. configured via application properties) and streaming/processing chunks instead of materializing large result sets in memory.

Copilot uses AI. Check for mistakes.
variables = gson.fromJson(rawVar, Map.class);
// Verify that all parameter inputs are less than 1000 values
int maxValues = 1000;
int maxValues = 200000;
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increasing maxValues to 200000 can allow extremely large user-supplied lists, risking excessive query expansion and memory pressure; consider a lower configurable limit or pagination of input sets.

Copilot uses AI. Check for mistakes.
public static class ES_UNITS {
public static final int DEFAULT_SIZE = 10;
public static final int MAX_SIZE = 10000;
public static final int MAX_SIZE = 200000;
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting MAX_SIZE and AGGS_SIZE to 200000 may trigger very large from/size or aggregation buckets responses that are expensive for OpenSearch and the application; consider capping these or requiring scrolling/ composite aggs for large cardinalities.

Copilot uses AI. Check for mistakes.
public static final String SORT_DIRECTION = "sort_direction";
public static final String CASE_IDS = "case_ids";
public static final int AGGS_SIZE = 1000;
public static final int AGGS_SIZE = 200000;
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting MAX_SIZE and AGGS_SIZE to 200000 may trigger very large from/size or aggregation buckets responses that are expensive for OpenSearch and the application; consider capping these or requiring scrolling/ composite aggs for large cardinalities.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the suggestions

Comment on lines +386 to +394
query.put("size", SCROLL_SIZE);
String jsonizedQuery = gson.toJson(query);
request.setJsonEntity(jsonizedQuery);
request.addParameter("scroll", "1m");
// JsonObject page = rollToPage(request, pageSize, offset);
// return collectPage(page, properties, pageSize, offset % SCROLL_SIZE);
JsonArray page = rollToPage(request, pageSize, offset);
// checkMemoryInit();
return collectScrollPage(page, properties, pageSize, offset % SCROLL_SIZE);
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

collectPageWithScroll always requests SCROLL_SIZE regardless of (pageSize, offset) and materializes all intermediate hits, which can overshoot and allocate far more data than needed; compute an adaptive first batch size (e.g. remainingNeeded = offsetRemainder + pageSize) and slice only required hits instead of filling an oversized JsonArray.

Suggested change
query.put("size", SCROLL_SIZE);
String jsonizedQuery = gson.toJson(query);
request.setJsonEntity(jsonizedQuery);
request.addParameter("scroll", "1m");
// JsonObject page = rollToPage(request, pageSize, offset);
// return collectPage(page, properties, pageSize, offset % SCROLL_SIZE);
JsonArray page = rollToPage(request, pageSize, offset);
// checkMemoryInit();
return collectScrollPage(page, properties, pageSize, offset % SCROLL_SIZE);
// Compute the minimum number of hits needed for the first batch
int offsetRemainder = offset % SCROLL_SIZE;
int remainingNeeded = offsetRemainder + pageSize;
int firstBatchSize = Math.min(remainingNeeded, SCROLL_SIZE);
query.put("size", firstBatchSize);
String jsonizedQuery = gson.toJson(query);
request.setJsonEntity(jsonizedQuery);
request.addParameter("scroll", "1m");
JsonArray page = rollToPage(request, pageSize, offset);
// Slice only the required hits from the page
int startIdx = offsetRemainder;
int endIdx = Math.min(startIdx + pageSize, page.size());
JsonArray slicedPage = new JsonArray();
for (int i = startIdx; i < endIdx; i++) {
slicedPage.add(page.get(i));
}
return collectScrollPage(slicedPage, properties, pageSize, 0);

Copilot uses AI. Check for mistakes.
String scrollId = jsonObject.get("_scroll_id").getAsString();
Map<String, Object> scrollQuery = Map.of(
"scroll", "10S",
"scroll", "1m",
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scroll keep-alive value '1m' is duplicated; extract a constant (e.g. SCROLL_KEEPALIVE) to avoid divergence and ease future tuning.

Copilot uses AI. Check for mistakes.
query.put("size", SCROLL_SIZE);
String jsonizedQuery = gson.toJson(query);
request.setJsonEntity(jsonizedQuery);
request.addParameter("scroll", "1m");
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scroll keep-alive value '1m' is duplicated; extract a constant (e.g. SCROLL_KEEPALIVE) to avoid divergence and ease future tuning.

Copilot uses AI. Check for mistakes.
scrollRequest = new Request("POST", SCROLL_ENDPOINT);
Map<String, Object> scrollQuery = Map.of(
"scroll", "10S",
"scroll", "1m",
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scroll keep-alive value '1m' is duplicated; extract a constant (e.g. SCROLL_KEEPALIVE) to avoid divergence and ease future tuning.

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +92
// public void checkMemoryInit() {
// // Get the Java Runtime object
// Runtime runtime = Runtime.getRuntime();

// // Get the maximum heap size (in bytes)
// long maxMemory = runtime.maxMemory();
// // Get the initial heap size (in bytes)
// long initialMemory = runtime.totalMemory();
// // Get the current available memory (in bytes)
// long freeMemory = runtime.freeMemory();

// // Convert to MB for better readability
// System.out.println("Initial Heap Size: " + (initialMemory / (1024 * 1024)) + " MB");
// System.out.println("Maximum Heap Size: " + (maxMemory / (1024 * 1024)) + " MB");
// System.out.println("Free Memory: " + (freeMemory / (1024 * 1024)) + " MB");
// }

// public void checkMemoryNow() {
// // Optionally log the memory usage using MemoryMXBean
// MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean();
// MemoryUsage heapMemoryUsage = memoryMXBean.getHeapMemoryUsage();

// System.out.println("Used Heap Memory: " + (heapMemoryUsage.getUsed() / (1024 * 1024)) + " MB");
// System.out.println("Committed Heap Memory: " + (heapMemoryUsage.getCommitted() / (1024 * 1024)) + " MB");
// }
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large blocks of commented-out diagnostic code add noise; remove them or convert to conditional debug logging (logger.debug with guarded feature flag) to keep the class concise.

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +10
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryUsage;
Copy link

Copilot AI Oct 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These memory management imports are unused in this class; remove them to avoid clutter.

Suggested change
import java.lang.management.ManagementFactory;
import java.lang.management.MemoryMXBean;
import java.lang.management.MemoryUsage;

Copilot uses AI. Check for mistakes.
@AustinSMueller AustinSMueller changed the base branch from master to 4.11.0 October 22, 2025 17:43
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

* @param properties The Opensearch properties to retrieve
* @param pageSize The desired number of results to obtain
* @param offset The desired offset of the results
* @return
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @return tag is missing a description. Please add a description of what this method returns (e.g., 'A list of maps containing the requested page of results').

Copilot uses AI. Check for mistakes.
* @param request The Opensearch request
* @param pageSize How many records to obtain
* @param offset How many records to skip
* @return
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The @return tag is missing a description. Please add a description of what this method returns (e.g., 'A JsonArray containing all hits from the scroll requests').

Suggested change
* @return
* @return A JsonArray containing all hits from the scroll requests, up to the requested page size and offset.

Copilot uses AI. Check for mistakes.
request.setJsonEntity(jsonizedQuery);
request.addParameter("scroll", "1m");
// JsonObject page = rollToPage(request, pageSize, offset);
// return collectPage(page, properties, pageSize, offset % SCROLL_SIZE);
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out code. These lines appear to be old implementation that has been replaced with the new JsonArray-based approach.

Suggested change
// return collectPage(page, properties, pageSize, offset % SCROLL_SIZE);

Copilot uses AI. Check for mistakes.
// JsonObject page = rollToPage(request, pageSize, offset);
// return collectPage(page, properties, pageSize, offset % SCROLL_SIZE);
JsonArray page = rollToPage(request, pageSize, offset);
// checkMemoryInit();
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out debugging code. If memory monitoring is needed in production, it should be implemented properly with appropriate logging levels.

Suggested change
// checkMemoryInit();

Copilot uses AI. Check for mistakes.
Comment on lines +408 to +409
// JsonObject outerHits = new JsonObject(); // Helper JSON object for the results
// JsonObject results = new JsonObject(); // The results to return
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out code. These variables are no longer needed since the method now returns a JsonArray directly.

Copilot uses AI. Check for mistakes.
Comment on lines +457 to +458
// outerHits.add("hits", allHits);
// results.add("hits", outerHits);
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out code. The old JsonObject wrapping logic has been replaced by returning the JsonArray directly.

Copilot uses AI. Check for mistakes.
public List<Map<String, Object>> collectScrollPage(JsonArray searchHits, String[][] properties, int pageSize, int offset) throws IOException {
List<Map<String, Object>> data = new ArrayList<>();

//JsonArray searchHits = jsonObject.getAsJsonObject("hits").getAsJsonArray("hits");
Copy link

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented-out code. This old line extracts searchHits from a JsonObject, but the method now receives searchHits directly as a parameter.

Suggested change
//JsonArray searchHits = jsonObject.getAsJsonObject("hits").getAsJsonArray("hits");

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants