-
Notifications
You must be signed in to change notification settings - Fork 117
DENG-9727: Added mode last struct retain nulls udf #8130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
59085b7
to
6edc7cc
Compare
6edc7cc
to
c604122
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a difference in output with stats.mode_last_retain_nulls
? I remembered that bigquery recently(?) added support for grouping by structs so it's possible that stats.mode_last_retain_nulls
works as-is. Can you try putting these test cases in that udf?
) | ||
) | ||
), | ||
-- 6) NULL struct occurs most frequently -> expect NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense for this udf but for #7974, wouldn't we want to consider Berlin as the first/last seen city in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This UDF would first be applied to stable tables to produce one row per client per day (mirroring baseline_clients_daily). We should retain NULLs at this stage: clients in cities with populations <15k or in locations MaxMind can’t map should remain NULL. If we drop NULLs here, we’d misrepresent a client’s true location and only capture them when they travel to a resolvable city. Downstream, the city_seen table can then keep only the non-NULL city values after this step. I hope that makes sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that matches what I was thinking so I might be misunderstanding where this is going to be used. I'll see if it makes sense when I look at where the udf is used
I tried putting these test cases in |
Integration report for "Use struct equals"
|
Yeah this one can be closed if the other udf does what you need |
Description
Added a
mode_last_struct_retain_nulls
UDF that returns the most frequent STRUCT in an array; if there’s a tie, it selects the latest occurrence. Use this to pick a single, self-consistent set of related fields (e.g., city, subdivision(s), country) together, rather than aggregating each field separately. Retain nulls.Related Tickets & Documents
Reviewer, please follow this checklist