Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface metric metadata on aggregation #1223

Open
shangyian opened this issue Nov 6, 2024 · 0 comments
Open

Surface metric metadata on aggregation #1223

shangyian opened this issue Nov 6, 2024 · 0 comments
Assignees

Comments

@shangyian
Copy link
Contributor

shangyian commented Nov 6, 2024

Introduction

For any set of metrics and dimensions, we can generate measures SQL, which is used to materialize a dataset that can serve those metrics and dimensions in a more efficient manner.

Today, the generated measures SQL is non-optimized - the measures are computed at the level of the metrics' upstream transforms, without considering any downstream aggregation or grain optimizations. The current measures SQL endpoint is equivalent to the SQL generated for the selected metrics' upstream transforms:

image

Future

It is possible to make this significantly more efficient by breaking down metrics into pre-aggregated (but still further aggregatable) measures:

image

  • Metrics are combinations of one or more measures with specific aggregation or formulas applied (e.g., SUM(sales_amount), AVG(revenue), SUM(clicks) / SUM(impressions)).
  • Measures are components used to build metrics (e.g., sales_amount, revenue, user_count).

We'll aim to keep the same flexibility that users have today with the metric definition (e.g., arbitrary SQL). However, we'll parse out this SQL into a list of measures, along with the necessary aggregation function.

Steps:

  • Find the aggregation functions in the metric SQL.
  • The expressions inside the aggregation funcs can be turned into measures
  • The derived SQL can be generated by replacing the interior of the agg function with the measure

Metric Types

Simple Sum

name: total_sales
sql: SUM(sales_amount)
measures:
  - name: sales_amount
    sql: sales_amount
    aggregation: SUM

Complex Sum

name: log_total_sales
sql: LN(SUM(COALESCE(sales_amount, 0)) + 1)
derived_sql: LN(SUM(sales_amount_1) + 1)
measures:
  - name: sales_amount_1
    sql: COALESCE(sales_amount, 0)
    aggregation: SUM

Average

DJ will need to automatically detect this type and generate one measure for the numerator and one for the denominator (count(*)).

name: avg_sales
sql: AVG(sales_amount)
measures:
  - name: sales_amount
    sql: sales_amount
    aggregation: SUM
  - name: count_1
    sql: 1
    aggregation: COUNT

Rates

Any type of metric with a numerator and a denominator.

   - name: click_through_rate
     sql: SUM(clicks) / SUM(impressions)
     derived_sql: clicks / impressions
     measures:
        - name: clicks
          sql: clicks
          aggregation: SUM
        - name: impressions
          sql: impressions
          aggregation: SUM

Has Ever

   - name: has_ever_condition
     sql: MAX(IF(condition, 1, 0))
     derived_sql: MAX(condition_measure)
     measures:
        - name: condition_measure
          sql: IF(condition, 1, 0)
          aggregation: MAX

Measures SQL

The measures SQL endpoint can be used to produce pre-aggregated dataset that can be used to serve the metrics + dimensions in question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant