Skip to content

Conversation

@tdcmeehan
Copy link
Contributor

@tdcmeehan tdcmeehan commented Dec 2, 2025

Note to reviewers:

Description

Implement predicate stitching for materialized views in MaterializedViewRewrite. When a materialized view is partially stale, the optimizer can now generate a UNION query that reads fresh data from storage and recomputes only the stale portions from base tables.

Notable changes:

  • Add USE_STITCHING mode to MaterializedViewStaleReadBehavior
  • Extend MaterializedViewRewrite to generate UNION plans combining storage reads with recompute branches
  • Add PlanClonerWithVariableMapping utility for cloning plan trees with variable remapping
  • Add MaterializedViewStitchingUtils for predicate propagation through join equivalences
  • Add Iceberg support for tracking changed partitions and generating stale predicates
  • Add session properties: materialized_view_staleness_window, materialized_view_force_stale (hidden, for testing)
  • Add Iceberg config: iceberg.materialized-view-max-changed-partitions

Depends on: #26764

Motivation and Context

Fixes #26756

Large materialized views are expensive to fully recompute. When only some base table partitions have changed since the last refresh, this change enables the optimizer to selectively recompute only the stale data rather than either serving stale results or reprocessing terabytes of unchanged data.

Impact

  • New USE_STITCHING value for materialized_view_stale_read_behavior session property (for default behavior when no table property value is present)
  • New session properties: materialized_view_staleness_window, materialized_view_force_stale
  • New Iceberg config: iceberg.materialized-view-max-changed-partitions (default: 100)
  • Iceberg connector now tracks changed partitions for staleness detection

Test Plan

  • Added unit tests in TestMaterializedViewStitchingUtils for predicate propagation
  • Added unit tests in TestPlanClonerWithVariableMapping for plan cloning
  • Extended TestMaterializedViewRewrite with stitching scenarios
  • Extended TestIcebergMaterializedViews with end-to-end stitching tests
  • Extended TestIcebergMaterializedViewOptimizer with many partition tracking tests

Contributor checklist

  • Please make sure your submission complies with our contributing guide, in particular code style and commit standards.
  • PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced.
  • Documented new properties (with its default value), SQL syntax, functions, or other functionality.
  • If release notes are required, they follow the release notes guidelines.
  • Adequate tests were added if applicable.
  • CI passed.
  • If adding new dependencies, verified they have an OpenSSF Scorecard score of 5.0 or higher (or obtained explicit TSC approval for lower scores).

Release Notes

Please follow release notes guidelines and fill in the release notes below.

== RELEASE NOTES ==

General Changes
* Add ``USE_STITCHING`` mode for ``materialized_view_stale_read_behavior`` session property to selectively recompute stale data instead of full recomputation.
* Add ``materialized_view_staleness_window`` session property to configure acceptable staleness duration.
* Add ``materialized_view_force_stale`` session property for testing stale read behavior.

Iceberg Connector Changes
* Add ``iceberg.materialized-view-max-changed-partitions`` config property (default: 100) to limit partition tracking for predicate stitching.
* Add support for tracking changed partitions in materialized views to enable predicate stitching optimization.

@prestodb-ci prestodb-ci added the from:IBM PR from IBM label Dec 2, 2025
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @tdcmeehan, your pull request is larger than the review limit of 150000 diff characters

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on the documentation! A few nits and suggestions, nothing major.

In this example:

* ``orders.order_date`` and ``customers.reg_date`` are equivalent due to the equality join condition
* Even though ``reg_date`` is not in the MV's SELECT list, staleness can be tracked through the equivalence to ``order_date``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Even though ``reg_date`` is not in the MV's SELECT list, staleness can be tracked through the equivalence to ``order_date``
* Even though ``reg_date`` is not in the SELECT list, staleness can be tracked through the equivalence to ``order_date``

The all caps MV was jarring. Suggest removing it as not needed because of context supporting the meaning, or maybe "the materialized view's SELECT list".


**How Passthrough Mapping Works**

1. **Equivalence Extraction**: During MV creation, Presto analyzes JOIN conditions to identify
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about MV. Suggest deleting, or spelling it out.

* Join must be an INNER JOIN (not LEFT, RIGHT, or FULL OUTER)
* Equality must be direct (``col1 = col2``), not through expressions like ``col1 = col2 + 1``
* Both columns must be partition columns in their respective tables
* At least one column in the equivalence class must be in the MV's output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about MV. Suggest deleting, or spelling it out.


- All refreshes recompute the entire result set
- REFRESH does not provide snapshot isolation across multiple base tables
- All refreshes recompute the entire result set (incremental refresh not yet supported)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- All refreshes recompute the entire result set (incremental refresh not yet supported)
- All refreshes recompute the entire result set (incremental refresh not supported)

"yet" is an implied promise that should be avoided in documentation.

I thought about suggesting deleting the entire parenthetical "(incremental refresh not yet supported)" as it's arguably implied in "All refreshes recompute the entire result set", but I also find value in the explicit declaration of "incremental refresh not supported" so I could go either way on it and be fine with it staying.

3. Partition constraints are built that identify exactly which data is stale

See the connector-specific documentation for details on how staleness is tracked.
For Iceberg tables, see :doc:`/connector/iceberg` (Materialized Views section).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For Iceberg tables, see :doc:`/connector/iceberg` (Materialized Views section).
For Iceberg tables, see :ref:`connector/iceberg:materialized views`.

Tested improved link in local doc build.

@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch 9 times, most recently from fae20f5 to 188b139 Compare December 10, 2025 21:12
@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch from 188b139 to dc5813d Compare December 11, 2025 18:54
@tdcmeehan tdcmeehan force-pushed the mv-iceberg-disjuncts branch from dc5813d to b8e71ad Compare December 12, 2025 18:19
@tdcmeehan tdcmeehan requested a review from aaneja December 12, 2025 18:28
@tdcmeehan tdcmeehan marked this pull request as ready for review December 12, 2025 18:28
@prestodb-ci prestodb-ci requested a review from a team December 12, 2025 18:28
@prestodb-ci prestodb-ci requested review from NivinCS and removed request for a team December 12, 2025 18:28
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @tdcmeehan, your pull request is larger than the review limit of 150000 diff characters

@tdcmeehan tdcmeehan marked this pull request as draft December 13, 2025 04:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

from:IBM PR from IBM

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Disjunctive Predicate Stitching for Materialized Views and Partition Stitching for Iceberg

3 participants