-
Notifications
You must be signed in to change notification settings - Fork 2
Ccdihub 4.10.0 #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: 4.11.0
Are you sure you want to change the base?
Ccdihub 4.10.0 #59
Conversation
to update opensearch to 2.x
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR increases Elasticsearch/OpenSearch query/result size limits and introduces a scroll-based pagination path for large result sets. It also updates YAML loading to use LoaderOptions and adjusts an OpenSearch import for newer versions.
- Added scrolling logic (collectPageWithScroll, rollToPage, collectScrollPage) with new constants for thresholds and sizes
- Raised multiple hard limits (MAX_ES_SIZE, MAX_SIZE, AGGS_SIZE, maxValues) to 200000
- Added LoaderOptions to SnakeYAML constructors and updated an OpenSearch Text import
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| service/ESService.java | Introduces scroll pagination, new constants, and helper methods for large result retrieval; adjusts scroll timeout. |
| model/search/yaml/type/SingleTypeYaml.java | Adds LoaderOptions to SnakeYAML constructor. |
| model/search/yaml/type/GroupTypeYaml.java | Adds LoaderOptions to SnakeYAML constructor. |
| model/search/yaml/type/GlobalTypeYaml.java | Adds LoaderOptions to SnakeYAML constructor. |
| model/search/mapper/TypeMapperImpl.java | Updates Text import to new OpenSearch package path. |
| controller/GraphQLController.java | Increases maximum accepted variable list size. |
| constants/Const.java | Raises MAX_SIZE and AGGS_SIZE limits. |
| ServletInitializer.java | Removes redundant servlet initializer class. |
| BentoApplication.java | Adds (unused) memory management imports. |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| public static final int MAX_ES_SIZE = 200000; // Do not return more than this number of records | ||
| public static final int SCROLL_THRESHOLD = 10000; // Use scroll when trying to retrieve past this number of records | ||
| public static final int SCROLL_SIZE = 10000; // How big each scroll should be |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Raising MAX_ES_SIZE to 200000 significantly increases worst-case memory and transfer costs; consider enforcing a tighter upper bound (e.g. configured via application properties) and streaming/processing chunks instead of materializing large result sets in memory.
| variables = gson.fromJson(rawVar, Map.class); | ||
| // Verify that all parameter inputs are less than 1000 values | ||
| int maxValues = 1000; | ||
| int maxValues = 200000; |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Increasing maxValues to 200000 can allow extremely large user-supplied lists, risking excessive query expansion and memory pressure; consider a lower configurable limit or pagination of input sets.
| public static class ES_UNITS { | ||
| public static final int DEFAULT_SIZE = 10; | ||
| public static final int MAX_SIZE = 10000; | ||
| public static final int MAX_SIZE = 200000; |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting MAX_SIZE and AGGS_SIZE to 200000 may trigger very large from/size or aggregation buckets responses that are expensive for OpenSearch and the application; consider capping these or requiring scrolling/ composite aggs for large cardinalities.
| public static final String SORT_DIRECTION = "sort_direction"; | ||
| public static final String CASE_IDS = "case_ids"; | ||
| public static final int AGGS_SIZE = 1000; | ||
| public static final int AGGS_SIZE = 200000; |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting MAX_SIZE and AGGS_SIZE to 200000 may trigger very large from/size or aggregation buckets responses that are expensive for OpenSearch and the application; consider capping these or requiring scrolling/ composite aggs for large cardinalities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are the suggestions
| query.put("size", SCROLL_SIZE); | ||
| String jsonizedQuery = gson.toJson(query); | ||
| request.setJsonEntity(jsonizedQuery); | ||
| request.addParameter("scroll", "1m"); | ||
| // JsonObject page = rollToPage(request, pageSize, offset); | ||
| // return collectPage(page, properties, pageSize, offset % SCROLL_SIZE); | ||
| JsonArray page = rollToPage(request, pageSize, offset); | ||
| // checkMemoryInit(); | ||
| return collectScrollPage(page, properties, pageSize, offset % SCROLL_SIZE); |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collectPageWithScroll always requests SCROLL_SIZE regardless of (pageSize, offset) and materializes all intermediate hits, which can overshoot and allocate far more data than needed; compute an adaptive first batch size (e.g. remainingNeeded = offsetRemainder + pageSize) and slice only required hits instead of filling an oversized JsonArray.
| query.put("size", SCROLL_SIZE); | |
| String jsonizedQuery = gson.toJson(query); | |
| request.setJsonEntity(jsonizedQuery); | |
| request.addParameter("scroll", "1m"); | |
| // JsonObject page = rollToPage(request, pageSize, offset); | |
| // return collectPage(page, properties, pageSize, offset % SCROLL_SIZE); | |
| JsonArray page = rollToPage(request, pageSize, offset); | |
| // checkMemoryInit(); | |
| return collectScrollPage(page, properties, pageSize, offset % SCROLL_SIZE); | |
| // Compute the minimum number of hits needed for the first batch | |
| int offsetRemainder = offset % SCROLL_SIZE; | |
| int remainingNeeded = offsetRemainder + pageSize; | |
| int firstBatchSize = Math.min(remainingNeeded, SCROLL_SIZE); | |
| query.put("size", firstBatchSize); | |
| String jsonizedQuery = gson.toJson(query); | |
| request.setJsonEntity(jsonizedQuery); | |
| request.addParameter("scroll", "1m"); | |
| JsonArray page = rollToPage(request, pageSize, offset); | |
| // Slice only the required hits from the page | |
| int startIdx = offsetRemainder; | |
| int endIdx = Math.min(startIdx + pageSize, page.size()); | |
| JsonArray slicedPage = new JsonArray(); | |
| for (int i = startIdx; i < endIdx; i++) { | |
| slicedPage.add(page.get(i)); | |
| } | |
| return collectScrollPage(slicedPage, properties, pageSize, 0); |
| String scrollId = jsonObject.get("_scroll_id").getAsString(); | ||
| Map<String, Object> scrollQuery = Map.of( | ||
| "scroll", "10S", | ||
| "scroll", "1m", |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scroll keep-alive value '1m' is duplicated; extract a constant (e.g. SCROLL_KEEPALIVE) to avoid divergence and ease future tuning.
| query.put("size", SCROLL_SIZE); | ||
| String jsonizedQuery = gson.toJson(query); | ||
| request.setJsonEntity(jsonizedQuery); | ||
| request.addParameter("scroll", "1m"); |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scroll keep-alive value '1m' is duplicated; extract a constant (e.g. SCROLL_KEEPALIVE) to avoid divergence and ease future tuning.
| scrollRequest = new Request("POST", SCROLL_ENDPOINT); | ||
| Map<String, Object> scrollQuery = Map.of( | ||
| "scroll", "10S", | ||
| "scroll", "1m", |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scroll keep-alive value '1m' is duplicated; extract a constant (e.g. SCROLL_KEEPALIVE) to avoid divergence and ease future tuning.
| // public void checkMemoryInit() { | ||
| // // Get the Java Runtime object | ||
| // Runtime runtime = Runtime.getRuntime(); | ||
|
|
||
| // // Get the maximum heap size (in bytes) | ||
| // long maxMemory = runtime.maxMemory(); | ||
| // // Get the initial heap size (in bytes) | ||
| // long initialMemory = runtime.totalMemory(); | ||
| // // Get the current available memory (in bytes) | ||
| // long freeMemory = runtime.freeMemory(); | ||
|
|
||
| // // Convert to MB for better readability | ||
| // System.out.println("Initial Heap Size: " + (initialMemory / (1024 * 1024)) + " MB"); | ||
| // System.out.println("Maximum Heap Size: " + (maxMemory / (1024 * 1024)) + " MB"); | ||
| // System.out.println("Free Memory: " + (freeMemory / (1024 * 1024)) + " MB"); | ||
| // } | ||
|
|
||
| // public void checkMemoryNow() { | ||
| // // Optionally log the memory usage using MemoryMXBean | ||
| // MemoryMXBean memoryMXBean = ManagementFactory.getMemoryMXBean(); | ||
| // MemoryUsage heapMemoryUsage = memoryMXBean.getHeapMemoryUsage(); | ||
|
|
||
| // System.out.println("Used Heap Memory: " + (heapMemoryUsage.getUsed() / (1024 * 1024)) + " MB"); | ||
| // System.out.println("Committed Heap Memory: " + (heapMemoryUsage.getCommitted() / (1024 * 1024)) + " MB"); | ||
| // } |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Large blocks of commented-out diagnostic code add noise; remove them or convert to conditional debug logging (logger.debug with guarded feature flag) to keep the class concise.
| import java.lang.management.ManagementFactory; | ||
| import java.lang.management.MemoryMXBean; | ||
| import java.lang.management.MemoryUsage; |
Copilot
AI
Oct 15, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These memory management imports are unused in this class; remove them to avoid clutter.
| import java.lang.management.ManagementFactory; | |
| import java.lang.management.MemoryMXBean; | |
| import java.lang.management.MemoryUsage; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| * @param properties The Opensearch properties to retrieve | ||
| * @param pageSize The desired number of results to obtain | ||
| * @param offset The desired offset of the results | ||
| * @return |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @return tag is missing a description. Please add a description of what this method returns (e.g., 'A list of maps containing the requested page of results').
| * @param request The Opensearch request | ||
| * @param pageSize How many records to obtain | ||
| * @param offset How many records to skip | ||
| * @return |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @return tag is missing a description. Please add a description of what this method returns (e.g., 'A JsonArray containing all hits from the scroll requests').
| * @return | |
| * @return A JsonArray containing all hits from the scroll requests, up to the requested page size and offset. |
| request.setJsonEntity(jsonizedQuery); | ||
| request.addParameter("scroll", "1m"); | ||
| // JsonObject page = rollToPage(request, pageSize, offset); | ||
| // return collectPage(page, properties, pageSize, offset % SCROLL_SIZE); |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out code. These lines appear to be old implementation that has been replaced with the new JsonArray-based approach.
| // return collectPage(page, properties, pageSize, offset % SCROLL_SIZE); |
| // JsonObject page = rollToPage(request, pageSize, offset); | ||
| // return collectPage(page, properties, pageSize, offset % SCROLL_SIZE); | ||
| JsonArray page = rollToPage(request, pageSize, offset); | ||
| // checkMemoryInit(); |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out debugging code. If memory monitoring is needed in production, it should be implemented properly with appropriate logging levels.
| // checkMemoryInit(); |
| // JsonObject outerHits = new JsonObject(); // Helper JSON object for the results | ||
| // JsonObject results = new JsonObject(); // The results to return |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out code. These variables are no longer needed since the method now returns a JsonArray directly.
| // outerHits.add("hits", allHits); | ||
| // results.add("hits", outerHits); |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out code. The old JsonObject wrapping logic has been replaced by returning the JsonArray directly.
| public List<Map<String, Object>> collectScrollPage(JsonArray searchHits, String[][] properties, int pageSize, int offset) throws IOException { | ||
| List<Map<String, Object>> data = new ArrayList<>(); | ||
|
|
||
| //JsonArray searchHits = jsonObject.getAsJsonObject("hits").getAsJsonArray("hits"); |
Copilot
AI
Oct 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove commented-out code. This old line extracts searchHits from a JsonObject, but the method now receives searchHits directly as a parameter.
| //JsonArray searchHits = jsonObject.getAsJsonObject("hits").getAsJsonArray("hits"); |
No description provided.