-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat(optimizer): Support predicate stitching in MaterializedViewRewrite #26728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @tdcmeehan, your pull request is larger than the review limit of 150000 diff characters
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work on the documentation! A few nits and suggestions, nothing major.
| In this example: | ||
|
|
||
| * ``orders.order_date`` and ``customers.reg_date`` are equivalent due to the equality join condition | ||
| * Even though ``reg_date`` is not in the MV's SELECT list, staleness can be tracked through the equivalence to ``order_date`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| * Even though ``reg_date`` is not in the MV's SELECT list, staleness can be tracked through the equivalence to ``order_date`` | |
| * Even though ``reg_date`` is not in the SELECT list, staleness can be tracked through the equivalence to ``order_date`` |
The all caps MV was jarring. Suggest removing it as not needed because of context supporting the meaning, or maybe "the materialized view's SELECT list".
|
|
||
| **How Passthrough Mapping Works** | ||
|
|
||
| 1. **Equivalence Extraction**: During MV creation, Presto analyzes JOIN conditions to identify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about MV. Suggest deleting, or spelling it out.
| * Join must be an INNER JOIN (not LEFT, RIGHT, or FULL OUTER) | ||
| * Equality must be direct (``col1 = col2``), not through expressions like ``col1 = col2 + 1`` | ||
| * Both columns must be partition columns in their respective tables | ||
| * At least one column in the equivalence class must be in the MV's output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above about MV. Suggest deleting, or spelling it out.
|
|
||
| - All refreshes recompute the entire result set | ||
| - REFRESH does not provide snapshot isolation across multiple base tables | ||
| - All refreshes recompute the entire result set (incremental refresh not yet supported) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| - All refreshes recompute the entire result set (incremental refresh not yet supported) | |
| - All refreshes recompute the entire result set (incremental refresh not supported) |
"yet" is an implied promise that should be avoided in documentation.
I thought about suggesting deleting the entire parenthetical "(incremental refresh not yet supported)" as it's arguably implied in "All refreshes recompute the entire result set", but I also find value in the explicit declaration of "incremental refresh not supported" so I could go either way on it and be fine with it staying.
| 3. Partition constraints are built that identify exactly which data is stale | ||
|
|
||
| See the connector-specific documentation for details on how staleness is tracked. | ||
| For Iceberg tables, see :doc:`/connector/iceberg` (Materialized Views section). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| For Iceberg tables, see :doc:`/connector/iceberg` (Materialized Views section). | |
| For Iceberg tables, see :ref:`connector/iceberg:materialized views`. |
Tested improved link in local doc build.
fae20f5 to
188b139
Compare
188b139 to
dc5813d
Compare
dc5813d to
b8e71ad
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @tdcmeehan, your pull request is larger than the review limit of 150000 diff characters
Note to reviewers:
MaterializedViewRewrite,PlanClonerWithVariableMappingandMaterializedViewStitchingUtils, which is more manageable (~`1500 LOCs).Description
Implement predicate stitching for materialized views in MaterializedViewRewrite. When a materialized view is partially stale, the optimizer can now generate a UNION query that reads fresh data from storage and recomputes only the stale portions from base tables.
Notable changes:
USE_STITCHINGmode toMaterializedViewStaleReadBehaviorMaterializedViewRewriteto generate UNION plans combining storage reads with recompute branchesPlanClonerWithVariableMappingutility for cloning plan trees with variable remappingMaterializedViewStitchingUtilsfor predicate propagation through join equivalencesmaterialized_view_staleness_window,materialized_view_force_stale(hidden, for testing)iceberg.materialized-view-max-changed-partitionsDepends on: #26764
Motivation and Context
Fixes #26756
Large materialized views are expensive to fully recompute. When only some base table partitions have changed since the last refresh, this change enables the optimizer to selectively recompute only the stale data rather than either serving stale results or reprocessing terabytes of unchanged data.
Impact
USE_STITCHINGvalue formaterialized_view_stale_read_behaviorsession property (for default behavior when no table property value is present)materialized_view_staleness_window,materialized_view_force_staleiceberg.materialized-view-max-changed-partitions(default: 100)Test Plan
TestMaterializedViewStitchingUtilsfor predicate propagationTestPlanClonerWithVariableMappingfor plan cloningTestMaterializedViewRewritewith stitching scenariosTestIcebergMaterializedViewswith end-to-end stitching testsTestIcebergMaterializedViewOptimizerwith many partition tracking testsContributor checklist
Release Notes
Please follow release notes guidelines and fill in the release notes below.