-
Notifications
You must be signed in to change notification settings - Fork 153
Speed up aggregation pushdown for single group-by expression #3550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Lantao Jin <[email protected]>
@@ -74,13 +80,20 @@ public AggregationQueryBuilder(ExpressionSerializer serializer) { | |||
return Pair.of( | |||
ImmutableList.copyOf(metrics.getLeft().getAggregatorFactories()), | |||
new NoBucketAggregationParser(metrics.getRight())); | |||
} else if (groupByList.size() == 1) { | |||
// one bucket, use values source bucket builder for getting better performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo here, should be one group-by key
, not one bucket
@@ -126,6 +128,12 @@ public static Pair<List<AggregationBuilder>, OpenSearchAggregationResponseParser | |||
return Pair.of( | |||
ImmutableList.copyOf(metricBuilder.getAggregatorFactories()), | |||
new NoBucketAggregationParser(metricParserList)); | |||
} else if (aggregate.getGroupSet().length() == 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CompositeAgg is slow only for single group by case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, composite aggregation is slow for all cases. If there are multiple group-by keys, we have to use composite aggregation with various sources, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we also report this issue to core for optimizing CompositeAgg with single group expression?
I think, since CompositeAgg with single group is logically equivalent to BucketAgg and customers are able to write such DSL, it will be a good chance to improve the previous execution efficiency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are multiple group-by keys, we have to use composite aggregation with various sources, isn't it?
DSL support multi-level terms aggregtion. for instance
GET /_search
{
"aggs": {
"countries": {
"terms": {
"field": "artist.country",
},
"aggs": {
"rock": {
"filter": { "term": { "genre": "rock" } },
"aggs": {
"playback_stats": { "stats": { "field": "play_count" } }
}
}
}
}
}
}
The reasons V2 use composite aggregation is full support pagenation use case.
opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java
Show resolved
Hide resolved
The IT failures are caused by opensearch-project/OpenSearch#17959. It blocks our optimization work. |
This PR is stalled because it has been open for 30 days with no activity. |
Description
Currently, the aggregation pushdown implementation build a composite aggregation builder which is expensive WARNING.
This PR is refactoring the implementation of single group-by aggregation pushdown which could get 100x faster than current implementation. See the usage case in #3528
Related Issues
Resolves #3528
Check List
--signoff
.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.