-
Notifications
You must be signed in to change notification settings - Fork 117
feat(DENG-9250): update firefox app store territory report to include new connector data #7971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
75fd8cf
to
6e89135
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I see what's happening between :
- sql/moz-fx-data-shared-prod/app_store/firefox_app_store_territory_source_type_report/view.sql
- sql/moz-fx-data-shared-prod/firefox_ios_derived/app_store_funnel_v1/query.sql
So before, firefox_app_store_territory_source_type_report used to only report impressions, but firefox_downloads_territory_source_type_report would separately include downloads.
Now, the new format is, both download and impression metrics are reported in firefox_app_store_territory_source_type_report.
We used to combine them separately under a query in app_store_funnel_v1, but things have changed.
I think it's good to be thoughtful how we encode the logic for easy interpretation in the future. I think there's two, equally good options:
- update the
firefox_app_store_territory_source_type_report
view logic to include pulling download numbers from firefox_downloads_territory_source_type_report before the cutoff date, and then keepapp_store_funnel_v1
logic simple (just pull directly from the view, no before/after logic) - keep the before/after logic in the
app_store_funnel_v1
, but version thefirefox_app_store_territory_source_type_report
views, so they're separate tables pre-cutoff and post cutoff.
I just don't want to logic to be split across multiple places.
Let me know what you think.
page_views, | ||
page_views_unique_device, | ||
-- These fields did not exist in the previous version. | ||
CAST(NULL AS INTEGER) AS first_time_downloads, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kik-kik , so these metrics exist in the last format, they just were split out in a different table:
moz-fx-data-shared-prod.app_store.firefox_downloads_territory_source_type_report
Let's source it from here, instead of casting as null so we don't lose data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't review the data in moz-fx-data-bq-fivetran.firefox_app_store_v2_apple_store.apple_store__territory_report
because I don't have access. But is that data available and complete pre-"2024-01-01"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @SuYoungHong, firefox_app_store_v2_apple_store
only contains data as far back as 2024-01-01.
Regarding this:
Let's source it from here, instead of casting as null so we don't lose data
We are not losing any data here with the casting, these fields simply do not exist in the source referenced here.
But also seeing that this only gets referenced inside the app_store_funnel_v1
query I'd probably just delete this view entirely keep the view unmodified.
@@ -1,75 +1,128 @@ | |||
-- TODO: should we run this job with 7 day delay to make sure all data landed (a wider window to be on the safe side). | |||
WITH views_data AS ( | |||
WITH historical_store_data AS ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey, I'm not sure why the historical data subquery is required here... because the split logic for how we define things (pull downloads and impressions separately from separate tables under old format, and from one table from new format) should be encoded in the firefox_app_store_territory_source_type_report
view above, no?
It's a bit confusing seeing that logic twice, maybe better to just keep that logic in one place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we require the historical in case we want to run the query for past data prior to 2024-01-01. historical_store_data
basically should contain the old logic which required us to combine two different data sources to get impression and downloads information. The new connector based dataset already has all of this data in a single dataset so we do not need to do any additional joins.
WHERE | ||
DATE(`date`) = DATE_SUB(@submission_date, INTERVAL 7 DAY) | ||
AND source_type <> 'Institutional Purchase' | ||
AND app_id = 989804926 -- Filter to only include the Firefox app |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thikn we only filter for institutional purchase in the downloads section in the old query... I'm totally okay with changing it (might prefer it in fact), but I'd like to check if it affects the numbers at all (I'll need access to moz-fx-data-shared-prod.app_store.firefox_downloads_territory_source_type_report
to check though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto for the app_id filter (I'm assuming it won't affect numbers and is good defensive coding, but want to validate directly)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could create a temp table for you which include app_id dimension
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, this looks good to me.
Just 2 things I'd like noted (for documentation purposes)
- we're explicitly filtering by
app_id = 989804926
, which we weren't before - we're explicitly filtering out
source_type <> 'Institutional Purchase'
for both views and installs (previously we only filtered for installs).
Both of these changes are probably appropriate, but good to note it will be different from the past
Also, we're defining new profiles via moz-fx-data-shared-prod.firefox_ios.new_profiles
.
Before we were moz-fx-data-shared-prod.firefox_ios.firefox_ios_clients
.
Just want to confirm that firefox_ios.new_profiles is the preferred / recommended method for counting new profiles instead of firefox_ios.firefox_ios_clients. Would the numbers change due to this, or would they be reporting essentially the same thing?
Other then that, looks good. I know we discussed getting access to the moz-fx-data-bq-fivetran.firefox_app_store_v2_apple_store.apple_store__territory_report
to do some validation, which I think we should still do, but I don't think that should block approval.
Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requesting one addition due to an issue i discovered in the data.
SELECT | ||
DATE(`date`) AS `date`, | ||
territory AS country_name, | ||
SUM(impressions_unique_device) AS impressions, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
So I reviewed this, and this looks good to me.
However, I found a data issue in the new data that Apple is sending back, that I think we should address within this query / PR.
So in the v1 of the data Apple was sending back, visits from source_type:
- App Referrer
- Unavailable
- Web Referrer
Were generating an impression as well as a pageview.
In the new data Apple is sending back (v2), it seems to only generate a pageview. I think this probably makes sense, and is an improvement on their part.
Can we add the following logic to the query?
SUM(case when source_type in ('App Referrer', 'Unavailable', 'Web Referrer') THEN 0 ELSE impressions_unique_device END)
-- add comment: apple used to count pageviews from these sources as impressions as well but no longer do in the new data they're sending back
This way, the logic for what is in impression is consistent for data from v1 data and v2 data in this aggregation.
Thanks!
Other then that, I think we're good to go with this query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the impression logic to the query to the historical data.
…nclude new connector data
2ba973c
to
fd99637
Compare
This comment has been minimized.
This comment has been minimized.
…rsion of the data to match the new report data
Integration report for "feat: limit which source types count toward impressions in the old version of the data to match the new report data"
|
@kik-kik , one addition I'd like to request like I mentioned in Slack. Can we add a Same logic for v1 and v2 sources. Just: Thanks! |
feat(DENG-9250): update firefox app store territory report to include new connector data