Speed up aggregation pushdown for single group-by expression #3550

LantaoJin · 2025-04-15T10:36:32Z

Description

Currently, the aggregation pushdown implementation build a composite aggregation builder which is expensive WARNING.

This PR is refactoring the implementation of single group-by aggregation pushdown which could get 100x faster than current implementation. See the usage case in #3528

Related Issues

Resolves #3528

Check List

New functionality includes testing.
New functionality has been documented.
New functionality has javadoc added.
New functionality has a user manual doc added.
API changes companion pull request created.
Commits are signed per the DCO using --signoff.
Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Lantao Jin <[email protected]>

LantaoJin · 2025-04-15T10:39:50Z

...n/java/org/opensearch/sql/opensearch/storage/script/aggregation/AggregationQueryBuilder.java

@@ -74,13 +80,20 @@ public AggregationQueryBuilder(ExpressionSerializer serializer) {
      return Pair.of(
          ImmutableList.copyOf(metrics.getLeft().getAggregatorFactories()),
          new NoBucketAggregationParser(metrics.getRight()));
+    } else if (groupByList.size() == 1) {
+      // one bucket, use values source bucket builder for getting better performance


typo here, should be one group-by key, not one bucket

penghuo · 2025-04-15T20:11:50Z

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java

@@ -126,6 +128,12 @@ public static Pair<List<AggregationBuilder>, OpenSearchAggregationResponseParser
        return Pair.of(
            ImmutableList.copyOf(metricBuilder.getAggregatorFactories()),
            new NoBucketAggregationParser(metricParserList));
+      } else if (aggregate.getGroupSet().length() == 1) {


CompositeAgg is slow only for single group by case?

No, composite aggregation is slow for all cases. If there are multiple group-by keys, we have to use composite aggregation with various sources, isn't it?

Shall we also report this issue to core for optimizing CompositeAgg with single group expression?

I think, since CompositeAgg with single group is logically equivalent to BucketAgg and customers are able to write such DSL, it will be a good chance to improve the previous execution efficiency.

If there are multiple group-by keys, we have to use composite aggregation with various sources, isn't it?

DSL support multi-level terms aggregtion. for instance

GET /_search { "aggs": { "countries": { "terms": { "field": "artist.country", }, "aggs": { "rock": { "filter": { "term": { "genre": "rock" } }, "aggs": { "playback_stats": { "stats": { "field": "play_count" } } } } } } } }

The reasons V2 use composite aggregation is full support pagenation use case.

opensearch/src/main/java/org/opensearch/sql/opensearch/request/AggregateAnalyzer.java

LantaoJin · 2025-04-16T03:15:47Z

The IT failures are caused by opensearch-project/OpenSearch#17959. It blocks our optimization work.

opensearch-trigger-bot · 2025-05-16T15:21:22Z

This PR is stalled because it has been open for 30 days with no activity.

Speed up aggregation pushdown for single group-by expression

5192730

Signed-off-by: Lantao Jin <[email protected]>

LantaoJin added aggregation performance Make it fast! labels Apr 15, 2025

LantaoJin commented Apr 15, 2025

View reviewed changes

LantaoJin marked this pull request as ready for review April 15, 2025 10:41

LantaoJin requested review from ps48, kavithacm, derek-ho, joshuali925, dai-chen, YANG-DB, mengweieric, Swiddis, penghuo, seankao-az, MaxKsyunz, Yury-Fridlyand, anirudha, forestmvey, acarbonetto, GumpacG, ykmr1224, noCharger and qianheng-aws as code owners April 15, 2025 10:41

penghuo reviewed Apr 15, 2025

View reviewed changes

opensearch-trigger-bot bot added the stalled label May 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up aggregation pushdown for single group-by expression #3550

Speed up aggregation pushdown for single group-by expression #3550

LantaoJin commented Apr 15, 2025 •

edited

Loading

LantaoJin Apr 15, 2025

penghuo Apr 15, 2025

LantaoJin Apr 16, 2025

qianheng-aws Apr 16, 2025 •

edited

Loading

penghuo Apr 16, 2025

LantaoJin commented Apr 16, 2025

opensearch-trigger-bot bot commented May 16, 2025

Speed up aggregation pushdown for single group-by expression #3550

Are you sure you want to change the base?

Speed up aggregation pushdown for single group-by expression #3550

Conversation

LantaoJin commented Apr 15, 2025 • edited Loading

Description

Related Issues

Check List

LantaoJin Apr 15, 2025

Choose a reason for hiding this comment

penghuo Apr 15, 2025

Choose a reason for hiding this comment

LantaoJin Apr 16, 2025

Choose a reason for hiding this comment

qianheng-aws Apr 16, 2025 • edited Loading

Choose a reason for hiding this comment

penghuo Apr 16, 2025

Choose a reason for hiding this comment

LantaoJin commented Apr 16, 2025

opensearch-trigger-bot bot commented May 16, 2025

LantaoJin commented Apr 15, 2025 •

edited

Loading

qianheng-aws Apr 16, 2025 •

edited

Loading