Skip to content

Conversation

kwindau
Copy link
Contributor

@kwindau kwindau commented Sep 28, 2025

Description

This PR creates the new table & associated view:

  • moz-fx-data-shared-prod.mozilla_org_derived.blog_performance_v1
  • moz-fx-data-shared-prod.mozilla_org.blog_performance

Related Tickets & Documents

Reviewer, please follow this checklist

@kwindau kwindau force-pushed the DENG-9765-blog-performance branch from d52ba7f to e2bce3e Compare September 28, 2025 23:32
@kwindau kwindau changed the title feat(DENG-9765): Initial commit, work in progress feat(DENG-9765): Create mozilla_org_derived.blog_performance_v1 Sep 28, 2025
@dataops-ci-bot

This comment has been minimized.

@kwindau kwindau force-pushed the DENG-9765-blog-performance branch from a5524d2 to 5eba54d Compare September 29, 2025 14:58
@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@kwindau kwindau force-pushed the DENG-9765-blog-performance branch from 9b97c86 to 64c0bfc Compare September 29, 2025 16:40
@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@dataops-ci-bot

This comment has been minimized.

@kwindau kwindau marked this pull request as ready for review September 30, 2025 22:02
@dataops-ci-bot

This comment has been minimized.

@kwindau kwindau added this pull request to the merge queue Oct 1, 2025
github-merge-queue bot pushed a commit that referenced this pull request Oct 1, 2025
* feat(DENG-9765): Initial commit, work in progress

* Remove views per session since we want to calc that in Looker as a measure

* Add aggregate table label

* feat(DENG-9765): Update key event list based on what blog.mozilla.org currently considers key events

* Add engaged sessions count

* Work in progress

* Final fixed
@kwindau kwindau removed this pull request from the merge queue due to a manual request Oct 1, 2025
@kwindau kwindau added this pull request to the merge queue Oct 1, 2025
@kwindau kwindau removed this pull request from the merge queue due to a manual request Oct 1, 2025
@dataops-ci-bot
Copy link

Integration report for "Final fixed"

sql.diff

Click to expand!
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/dags/bqetl_google_analytics_derived_ga4.py /tmp/workspace/generated-sql/dags/bqetl_google_analytics_derived_ga4.py
--- /tmp/workspace/main-generated-sql/dags/bqetl_google_analytics_derived_ga4.py	2025-10-01 01:59:58.000000000 +0000
+++ /tmp/workspace/generated-sql/dags/bqetl_google_analytics_derived_ga4.py	2025-10-01 02:02:13.000000000 +0000
@@ -174,6 +174,19 @@
         pool="DATA_ENG_EXTERNALTASKSENSOR",
     )
 
+    checks__fail_mozilla_org_derived__blog_performance__v1 = bigquery_dq_check(
+        task_id="checks__fail_mozilla_org_derived__blog_performance__v1",
+        source_table="blog_performance_v1",
+        dataset_id="mozilla_org_derived",
+        project_id="moz-fx-data-shared-prod",
+        is_dq_check_fail=True,
+        owner="[email protected]",
+        email=["[email protected]", "[email protected]"],
+        depends_on_past=False,
+        parameters=["submission_date:DATE:{{ds}}"],
+        retries=0,
+    )
+
     checks__fail_mozilla_org_derived__ga_clients__v2 = bigquery_dq_check(
         task_id="checks__fail_mozilla_org_derived__ga_clients__v2",
         source_table="ga_clients_v2",
@@ -321,6 +334,17 @@
         retries=0,
     )
 
+    mozilla_org_derived__blog_performance__v1 = bigquery_etl_query(
+        task_id="mozilla_org_derived__blog_performance__v1",
+        destination_table="blog_performance_v1",
+        dataset_id="mozilla_org_derived",
+        project_id="moz-fx-data-shared-prod",
+        owner="[email protected]",
+        email=["[email protected]", "[email protected]"],
+        date_partition_parameter="submission_date",
+        depends_on_past=False,
+    )
+
     mozilla_org_derived__blogs_daily_summary__v2 = bigquery_etl_query(
         task_id="mozilla_org_derived__blogs_daily_summary__v2",
         destination_table="blogs_daily_summary_v2",
@@ -566,6 +590,10 @@
         task_concurrency=1,
     )
 
+    checks__fail_mozilla_org_derived__blog_performance__v1.set_upstream(
+        mozilla_org_derived__blog_performance__v1
+    )
+
     checks__fail_mozilla_org_derived__ga_clients__v2.set_upstream(
         mozilla_org_derived__ga_clients__v2
     )
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org: blog_performance
Only in /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived: blog_performance_v1
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org/blog_performance/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org/blog_performance/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org/blog_performance/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org/blog_performance/metadata.yaml	2025-10-01 01:55:19.000000000 +0000
@@ -0,0 +1,18 @@
+friendly_name: Blog Performance
+description: |-
+  Please provide a description for the query
+owners: []
+labels: {}
+bigquery: null
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+- role: roles/bigquery.metadataViewer
+  members:
+  - workgroup:dataops-managed/external-fides
+  - workgroup:google-managed/external-ads-dataproc
+references:
+  view.sql:
+  - moz-fx-data-shared-prod.mozilla_org_derived.blog_performance_v1
+require_column_descriptions: false
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org/blog_performance/view.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org/blog_performance/view.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org/blog_performance/view.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org/blog_performance/view.sql	2025-10-01 01:52:26.000000000 +0000
@@ -0,0 +1,7 @@
+CREATE OR REPLACE VIEW
+  `moz-fx-data-shared-prod.mozilla_org.blog_performance`
+AS
+SELECT
+  *
+FROM
+  `moz-fx-data-shared-prod.mozilla_org_derived.blog_performance_v1`
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/checks.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/checks.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/checks.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/checks.sql	2025-10-01 01:52:26.000000000 +0000
@@ -0,0 +1,2 @@
+#fail
+{{ is_unique(["page_title"], "event_date = @submission_date") }}
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/metadata.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/metadata.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/metadata.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/metadata.yaml	2025-10-01 01:55:21.000000000 +0000
@@ -0,0 +1,32 @@
+friendly_name: Blog Performance
+description: |-
+  Blog page engagement metrics by page title and date for blog.mozilla.org pages.
+owners:
+- [email protected]
+labels:
+  incremental: true
+  owner1: kwindau
+  table_type: aggregate
+  dag: bqetl_google_analytics_derived_ga4
+scheduling:
+  dag_name: bqetl_google_analytics_derived_ga4
+bigquery:
+  time_partitioning:
+    type: day
+    field: event_date
+    require_partition_filter: false
+    expiration_days: null
+  range_partitioning: null
+  clustering:
+    fields:
+    - page_title
+workgroup_access:
+- role: roles/bigquery.dataViewer
+  members:
+  - workgroup:mozilla-confidential
+- role: roles/bigquery.metadataViewer
+  members:
+  - workgroup:dataops-managed/external-fides
+  - workgroup:google-managed/external-ads-dataproc
+references: {}
+require_column_descriptions: true
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/query.sql /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/query.sql
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/query.sql	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/query.sql	2025-10-01 01:52:26.000000000 +0000
@@ -0,0 +1,187 @@
+WITH all_submission_date_events AS (
+  SELECT
+    PARSE_DATE('%Y%m%d', event_date) AS event_date,
+    event_timestamp,
+    event_name,
+    CAST(
+      (
+        SELECT
+          `value`
+        FROM
+          UNNEST(event_params)
+        WHERE
+          key = 'ga_session_id'
+        LIMIT
+          1
+      ).int_value AS STRING
+    ) AS ga_session_id,
+    user_pseudo_id,
+    (
+      SELECT
+        `value`
+      FROM
+        UNNEST(event_params)
+      WHERE
+        key = 'page_title'
+      LIMIT
+        1
+    ).string_value AS page_title,
+    (
+      SELECT
+        `value`
+      FROM
+        UNNEST(event_params)
+      WHERE
+        key = 'page_location'
+      LIMIT
+        1
+    ).string_value AS page_location,
+    COALESCE(
+      (
+        SELECT
+          ep.value.int_value
+        FROM
+          UNNEST(event_params) ep
+        WHERE
+          ep.key = 'session_engaged'
+        LIMIT
+          1
+      ),
+      SAFE_CAST(
+        (
+          SELECT
+            ep.value.string_value
+          FROM
+            UNNEST(event_params) ep
+          WHERE
+            ep.key = 'session_engaged'
+          LIMIT
+            1
+        ) AS INT64
+      )
+    ) AS session_engaged_indicator,
+    CASE
+      WHEN event_name IN (
+          'click',
+          'cta_click',
+          'download_click',
+          'newsletter_subscribe',
+          'purchase',
+          'scroll',
+          'social_share'
+        )
+        THEN 1
+      ELSE 0
+    END AS key_event
+  FROM
+    `moz-fx-data-marketing-prod.analytics_314399816.events_*`
+  WHERE
+    _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', @submission_date)
+),
+-- look only at pageviews on blog.mozilla.org
+blog_page_views_by_date_page_title_and_visit_id AS (
+  SELECT
+    event_date,
+    ga_session_id || ' - ' || user_pseudo_id AS visit_identifier,
+    page_title,
+    COUNT(1) AS nbr_page_views
+  FROM
+    all_submission_date_events
+  WHERE
+    event_name = 'page_view'
+    AND LOWER(page_location) LIKE '%blog.mozilla.org%'
+  GROUP BY
+    1,
+    2,
+    3
+),
+-- sessions that had 1 or more blog page view on the submission date
+sessions_with_1_or_more_blog_page_view AS (
+  SELECT DISTINCT
+    visit_identifier
+  FROM
+    blog_page_views_by_date_page_title_and_visit_id
+),
+-- session-level engagement flag
+session_engagement AS (
+  SELECT
+    ga_session_id || ' - ' || user_pseudo_id AS visit_identifier,
+    MAX(session_engaged_indicator) AS session_engaged_flag
+  FROM
+    all_submission_date_events
+  GROUP BY
+    1
+),
+--get the last non-null page title before each event
+key_events_staging AS (
+  SELECT
+    event_date,
+    ga_session_id,
+    user_pseudo_id,
+    ga_session_id || ' - ' || user_pseudo_id AS visit_identifier,
+    event_timestamp,
+    event_name,
+    key_event,
+    LAST_VALUE(page_title IGNORE NULLS) OVER (
+      PARTITION BY
+        ga_session_id,
+        user_pseudo_id
+      ORDER BY
+        event_timestamp
+      ROWS BETWEEN
+        UNBOUNDED PRECEDING
+        AND CURRENT row
+    ) AS last_seen_page_title
+  FROM
+    all_submission_date_events
+),
+--get the # of key events by date, visit ID, and page title on or before the key event
+key_events AS (
+  SELECT
+    event_date,
+    last_seen_page_title,
+    COUNT(
+      DISTINCT(visit_identifier || ' - ' || event_name || ' - ' || CAST(event_timestamp AS string))
+    ) AS nbr_key_events
+  FROM
+    key_events_staging
+  WHERE
+    key_event = 1
+  GROUP BY
+    event_date,
+    last_seen_page_title
+),
+stats_by_page_title_and_date AS (
+  SELECT
+    pv.event_date,
+    pv.page_title,
+    SUM(pv.nbr_page_views) AS nbr_page_views,
+    COUNT(DISTINCT(pv.visit_identifier)) AS nbr_sessions,
+    COUNT(
+      DISTINCT(CASE WHEN se.session_engaged_flag = 1 THEN pv.visit_identifier ELSE NULL END)
+    ) AS nbr_engaged_sessions
+  FROM
+    sessions_with_1_or_more_blog_page_view s
+  JOIN
+    blog_page_views_by_date_page_title_and_visit_id pv
+    ON s.visit_identifier = pv.visit_identifier
+  LEFT JOIN
+    session_engagement se
+    ON s.visit_identifier = se.visit_identifier
+  GROUP BY
+    pv.event_date,
+    pv.page_title
+)
+SELECT
+  s.event_date,
+  s.page_title,
+  s.nbr_page_views,
+  s.nbr_sessions,
+  s.nbr_engaged_sessions,
+  ke.nbr_key_events
+FROM
+  stats_by_page_title_and_date s
+LEFT JOIN
+  key_events ke
+  ON s.page_title = ke.last_seen_page_title
+  AND s.event_date = ke.event_date
diff -bur --no-dereference --new-file /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/schema.yaml /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/schema.yaml
--- /tmp/workspace/main-generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/schema.yaml	1970-01-01 00:00:00.000000000 +0000
+++ /tmp/workspace/generated-sql/sql/moz-fx-data-shared-prod/mozilla_org_derived/blog_performance_v1/schema.yaml	2025-10-01 01:52:26.000000000 +0000
@@ -0,0 +1,27 @@
+fields:
+- mode: NULLABLE
+  name: event_date
+  type: DATE
+  description: Event Date
+- name: page_title
+  type: STRING
+  mode: NULLABLE
+  description: Page Title
+- name: nbr_page_views
+  type: INTEGER
+  mode: NULLABLE
+  description: Number of page views on the event_date for this page_title
+- name: nbr_sessions
+  type: INTEGER
+  mode: NULLABLE
+  description: Number of unique sessions viewing this page title on this event date
+- name: nbr_engaged_sessions
+  type: INTEGER
+  mode: NULLABLE
+  description: Number of unique engaged sessions viewing this page title on this event date
+- name: nbr_key_events
+  type: INTEGER
+  mode: NULLABLE
+  description: Number of key events associated with sessions that viewed this page title on this event date.
+    To prevent double counting, if a single session viewed 2 or more page titles on the event date, each key event for that session
+    gets associated to the last page title viewed before the key event occurred

Link to full diff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants