Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use -Array variants of aggregates in schema_array_transformer #1152

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

avelanarius
Copy link
Member

@avelanarius avelanarius commented Jan 3, 2025

schema_array_transformer transforms the SQL query for Array columns. Before this change, if an aggregation was performed on a Array column, e.g. sum(myArrayColumn), the transformer would change it into sum(arrayJoin(myArrayColumn)).

However using arrayJoin function has problems - arrayJoin modifies the result set of SQL query introducing additional rows. If there are many arrayJoins, a Cartesian product many rows will be performed: this causes query slowdown and makes the result invalid (we don't actually want to do a Cartesian product!).

Solve the problem by using -Array variants of aggregates (e.g. sumArray instead of sum(arrayJoin())), which does not inflate the number of result rows.

Note that this PR does NOT get rid of arrayJoin() fully in all cases. There are panels that actually need it, such as "Top products this week" in eCommerce dashboard, where we GROUP BY an array column.

Screenshot 2025-01-07 at 11 20 42

This remaining case should use the ARRAY JOIN operator, but this is out-of-scope of this PR.

schema_array_transformer transforms the SQL query for Array columns.
Before this change, if an aggregation was performed on a Array column,
e.g. sum(myArrayColumn), the transformer would change it into
sum(arrayJoin(myArrayColumn)).

However using arrayJoin function has problems - arrayJoin modifies
the result set of SQL query introducing additional rows. If there
are many arrayJoins, a Cartesian product many rows will be introduced:
this causes query slowdown and makes the result invalid (we don't
actually want to do a Cartesian product!).

Solve the problem by using "Array" variants of aggregates (e.g. sumArray
instead of sum(arrayJoin())), which does not inflate the number of
result rows.

Note that this PR does NOT get rid of arrayJoin() fully. There are
panels that actually need it, such as "Top products this week" in
eCommerce dashboard.
@avelanarius avelanarius changed the title schema_array_transformer fixes (part 1) Use "Array" variants of aggregates in schema_array_transformer Jan 7, 2025
@avelanarius avelanarius changed the title Use "Array" variants of aggregates in schema_array_transformer Use -Array variants of aggregates in schema_array_transformer Jan 7, 2025
@avelanarius avelanarius marked this pull request as ready for review January 7, 2025 10:29
@avelanarius avelanarius requested a review from a team as a code owner January 7, 2025 10:29
@@ -81,13 +205,28 @@ func NewArrayTypeVisitor(resolver arrayTypeResolver) model.ExprVisitor {
if ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IF suffix ends with merge, do special logic.

@@ -81,13 +205,28 @@ func NewArrayTypeVisitor(resolver arrayTypeResolver) model.ExprVisitor {
if ok {
dbType := resolver.dbColumnType(column.ColumnName)
if strings.HasPrefix(dbType, "Array") {
if strings.HasPrefix(e.Name, "sum") {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should resurrect that idea

https://github.com/QuesmaOrg/quesma/pull/666/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants