Skip to content

Conversation

dnhatn
Copy link
Member

@dnhatn dnhatn commented Oct 11, 2025

Currently, when filtering OrdinalBytesRefBlock and OrdinalBytesRefVector, we reuse the existing dictionary and filter only the ordinals. Although this is semantically correct, it can break HashAggregationOperator when ordinal optimizations are enabled in BlockHash.

Example:
Given an incoming OrdinalBytesRefBlock after filtering:

dict: [apple, banana, orange]
ordinals: [0, 0, 0, 0, 1, 1]

Here, orange is not referenced. During grouping and hashing, all dictionary entries [apple, banana, orange] are hashed with group IDs [1, 2, 3] (0 reserved for null), and the block produces groupings [1, 1, 1, 1, 2, 2]. The output is correct, but when evaluating partial/final results, we cannot exclude orange (groupId=3) unless we enter slow mode and track every seen group. This can cause an IndexOutOfBoundsException if orange is the largest group and the over-allocation is not enough to cover it, or orange may be included in the result even though it was excluded previously.

This change enforces the creation of a new Block/Vector when filtering OrdinalBytesRefBlock and OrdinalBytesRefVector.

Closes #136423

@dnhatn dnhatn added :Analytics/ES|QL AKA ESQL >bug v9.2.1 v9.1.6 v8.19.6 auto-backport Automatically create backport pull requests when merged labels Oct 11, 2025
@dnhatn dnhatn requested a review from nik9000 October 11, 2025 05:34
@dnhatn dnhatn marked this pull request as ready for review October 11, 2025 05:34
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 11, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @dnhatn, I've created a changelog YAML for you.

15 | Senior Team Lead
11 | Support Engineer
15 | Tech Lead
;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without the fix, Purchase Manager would re-appear in the result.

employees:long | job_positions:keyword
18             | Accountant
13             | Architect
11             | Business Analyst
13             | Data Scientist
10             | Head Human Resources
15             | Internship
14             | Junior Developer
20             | Principal Support Engineer
0              | Purchase Manager
13             | Python Developer
15             | Reporting Analyst
20             | Senior Python Developer
15             | Senior Team Lead
11             | Support Engineer
15             | Tech Lead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.6 v9.1.6 v9.2.1 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: ArrayIndexOutOfBoundsException in aggregation

2 participants