-
Notifications
You must be signed in to change notification settings - Fork 117
DO-2075 Added fenix and desktop baseline city seen tables #7974
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sql/moz-fx-data-shared-prod/fenix_derived/clients_city_seen_v1/schema.yaml
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
sql/moz-fx-data-shared-prod/fenix_derived/clients_city_seen_v1/metadata.yaml
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/fenix_derived/clients_city_seen_v1/schema.yaml
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/fenix_derived/clients_city_seen_v1/schema.yaml
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/clients_city_seen_v1/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/firefox_desktop_derived/clients_city_seen_v1/query.sql
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/fenix_derived/clients_city_seen_v1/metadata.yaml
Outdated
Show resolved
Hide resolved
sql/moz-fx-data-shared-prod/fenix_derived/clients_city_seen_v1/schema.yaml
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
64349bd
to
dc75f81
Compare
This comment has been minimized.
This comment has been minimized.
ce4fe26
to
ff85557
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should remove all the generated files in sql/
before merging because generated sql doesn't need to be checked in.
The is_init query with a 100% sample might be too expensive run in a single query and timeout. This is fine for the POC but something to prepare for later
sql_generators/baseline_clients_city_seen_v1/templates/query.sql
Outdated
Show resolved
Hide resolved
sql_generators/baseline_clients_city_seen_v1/templates/metadata.yaml
Outdated
Show resolved
Hide resolved
This comment has been minimized.
This comment has been minimized.
24fd5b0
to
33d0b9f
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
{% endraw %} | ||
WHERE | ||
client_info.client_id IS NOT NULL | ||
AND sample_id = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should pay attention to how long this takes to init with a 1% sample since it blocks artifact deployment during the init. It should be fine for now but we might need to do the full initial backfill another way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just did an init with a 2% sample and it took:
- firefox_desktop: 319s
- fenix: 111s
However, if I were to add @sample_id then the table would be initialized in parallel per sample_id:bigquery-etl/bigquery_etl/cli/query.py
Line 1460 in 59139fd
To run in parallel per sample_id, include a @sample_id parameter in the query.
Would that be sufficient or is there another way to do the full initial backfill that you would recommend? Thanks!
sql_generators/baseline_clients_city_seen_v1/templates/metadata.yaml
Outdated
Show resolved
Hide resolved
Co-authored-by: Ben Wu <[email protected]>
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Integration report for "Merge branch 'main' into do-2075-initialize-client-city-seen"
|
@BenWu @soGaussian could you do a final review/approval on this PR? I’d like to merge early next week. @BenWu is there a time of day you recommend for the merge so initialization doesn’t affect artifact deployments? Thanks! |
Description
Initialize the *baseline_city_seen tables by deriving each client’s first-seen and last-seen city, subdivision and country fields from the stable tables.
Note: This one-time initialization logic will no longer apply once city/subdivision/country fields are nulled in the stable tables.
Ongoing updates: After initialization, the tables will be updated daily via ETL using live tables (appending new clients and advancing last-seen values).
Related Tickets & Documents
Reviewer, please follow this checklist