Skip to content

Speed up aggregation pushdown for single group-by expression #3550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

LantaoJin
Copy link
Member

@LantaoJin LantaoJin commented Apr 15, 2025

Description

Currently, the aggregation pushdown implementation build a composite aggregation builder which is expensive WARNING.

This PR is refactoring the implementation of single group-by aggregation pushdown which could get 100x faster than current implementation. See the usage case in #3528

Related Issues

Resolves #3528

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@@ -74,13 +80,20 @@ public AggregationQueryBuilder(ExpressionSerializer serializer) {
return Pair.of(
ImmutableList.copyOf(metrics.getLeft().getAggregatorFactories()),
new NoBucketAggregationParser(metrics.getRight()));
} else if (groupByList.size() == 1) {
// one bucket, use values source bucket builder for getting better performance
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo here, should be one group-by key, not one bucket

@@ -126,6 +128,12 @@ public static Pair<List<AggregationBuilder>, OpenSearchAggregationResponseParser
return Pair.of(
ImmutableList.copyOf(metricBuilder.getAggregatorFactories()),
new NoBucketAggregationParser(metricParserList));
} else if (aggregate.getGroupSet().length() == 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CompositeAgg is slow only for single group by case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, composite aggregation is slow for all cases. If there are multiple group-by keys, we have to use composite aggregation with various sources, isn't it?

Copy link
Collaborator

@qianheng-aws qianheng-aws Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we also report this issue to core for optimizing CompositeAgg with single group expression?

I think, since CompositeAgg with single group is logically equivalent to BucketAgg and customers are able to write such DSL, it will be a good chance to improve the previous execution efficiency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are multiple group-by keys, we have to use composite aggregation with various sources, isn't it?

DSL support multi-level terms aggregtion. for instance

GET /_search
{
  "aggs": {
    "countries": {
      "terms": {
        "field": "artist.country",
      },
      "aggs": {
        "rock": {
          "filter": { "term": { "genre": "rock" } },
          "aggs": {
            "playback_stats": { "stats": { "field": "play_count" } }
          }
        }
      }
    }
  }
}

The reasons V2 use composite aggregation is full support pagenation use case.

@LantaoJin
Copy link
Member Author

The IT failures are caused by opensearch-project/OpenSearch#17959. It blocks our optimization work.

@opensearch-trigger-bot
Copy link
Contributor

This PR is stalled because it has been open for 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG]Span query in PPL is slower than date histogram aggregation in query DSL
3 participants