persist/storage: communicate projection pushdown via `shard_source` #30764

ParkMyCar · 2024-12-08T02:39:36Z

Continuation of #29539

Implements a limited form of projection pushdown by applying the demands of a MapFilterProject onto the RelationDesc of a collection, and using the resulting RelationDesc in shard_source. All of the columns will still be fetched from S3, but when decoding structured data we'll drop the unneeded columns, and decoding ProtoRow data we'll skip the unneeded Datums.

Motivation

Progress towards: https://github.com/MaterializeInc/database-issues/issues/8402

Fixes https://github.com/MaterializeInc/database-issues/issues/8825

Tips for reviewers

The implementation for the feature is in the first commit, and can be reviewed on its own. The later commits add testing and metrics and theoretically could be their own PRs, so it might be nice to review one commit at a time.

Checklist

This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

shepherdlybot · 2024-12-09T14:53:58Z

Mitigations

Completing required mitigations increases Resilience Coverage.

(Required) Code Review 🔍 Detected
(Required) Feature Flag
(Required) Integration Test
(Required) Observability
(Required) QA Review 🔍 Detected
(Required) Run Nightly Tests
Unit Test 🔍 Detected

Risk Summary:

This pull request has a high risk score of 81, driven by predictors such as "Sum Bug Reports Of Files" and "Delta of Executable Lines." Historically, PRs with these predictors are 114% more likely to cause a bug than the repository baseline. Although the repository's observed and predicted bug trends are decreasing, the presence of 2 file hotspots adds to the concern.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File	Percentile
../src/schema.rs	97
../sink/materialized_view_v2.rs	98

ParkMyCar · 2024-12-09T17:07:53Z

Btw I ran the all_source_data_roundtrips proptest for ~48 hours after adding the new changes for projection pushdown and checking stats.

bkirwi · 2024-12-09T22:18:38Z

src/persist-client/src/fetch.rs

-                        decode::<V>("val", &*both.val, &structured.val),
-                    ),
+                    PartMigration::SameSchema { both } => {
+                        let key_size_before = structured.key.get_array_memory_size();


Personally I'd be tempted to use the goodbytes accounting of memory for this, since that's how we count memory elsewhere in the structured-data code. (Including on the encoder!)

You could use the ArrayOrd implementation here. I think we'd probably still need new code for the Decoder impl, though it could follow the encoder's impl pretty closely.

Good call! Refactored to use ArrayOrd and goodbytes everywhere

bkirwi · 2024-12-09T22:20:49Z

src/persist-types/src/schema.rs

@@ -98,6 +98,8 @@ pub(crate) enum ArrayMigration {
    NoOp,
    Struct(Vec<StructArrayMigration>),
    List(FieldRef, Box<ArrayMigration>),
+    /// Replace the array with a [`NullArray`].
+    ReplaceWithNull,


This seems a bit sketchy in general... when does it come up?

Specifically since not all struct arrays are nullable, it seems like it could be a backwards-incompatible change unless we're protected from that in some other way...

Chatted offline but this came up when projecting away all of the fields in a StructArray. Removed this and added logic to use the StructArray::new_empty_fields constructor instead!

bkirwi · 2024-12-09T22:45:00Z

src/repr/src/relation.rs

@@ -367,6 +367,21 @@ pub struct ColumnIndex(usize);

 static_assertions::assert_not_impl_all!(ColumnIndex: Arbitrary);

+impl ColumnIndex {


Semi-related, but I'd love if there was some documentation on the RelationDesc level that said what ColumnMetadata was about, and how it relates to the RelationType.

For example: RelationDesc::iter zips together all the names from the metadata with all the types from the typ, but my understanding is that they're not meant to be 1:1. Seems like it could be a latent bug? But in either case some docs would make me more confident...

Good callout, I added some docs the describe the relationship between RelationDesc, ColumnMetadata, and RelationType, let me know if it makes sense or requires more detail!

I'm pretty sure RelationDesc::iter is still correct since the order of columns in the RelationType and metadata will be the same, but that's just by chance. I updated the impl to read from metadata to determine the index we should read from in the RelationType.

Mmm... and we do check that the cardinality is identical already. Does look safer the way you have it now in any case!

bkirwi · 2024-12-09T23:11:15Z

src/compute/src/render.rs

+                                .collect();
+                            ops.permute(remap, new_arity);
+                            read_schema = Some(new_desc);
+                        }


FWIW, some things I think might make this a little cleaner:

The demand + permute could be a new helper on the MFP itself... pre_project or something? May help clarify intent.

This logic could live inside the persist_source itself... though I guess you'd need to move the MfpPlan::create_from in there too. My feeling is that (rollout aside) you'd never not want to apply this projection, so better to not make it the caller's responisibility.

Neither suggestion blocking!

I played around with adding a demand + permute helper on the MFP itself, but couldn't get something that I really liked. Mainly because applying the demands on a RelationDesc returns a new RelationDesc instead of modifying in-place, which I think requires the helper to be able to return the new RelationDesc. It could also be a case of morning brain, if you have something specific you're thinking of here let me know, happy to circle back on it!

def- · 2024-12-10T10:32:03Z

I haven't seen this nightly timeout before: https://buildkite.com/materialize/nightly/builds/10655#0193ad59-5750-4d7f-8b2b-028341c71a11
Got stuck for an hour in platform-checks, not sure if related to this PR or not.

There was also a timeout in product limits: https://buildkite.com/materialize/nightly/builds/10655#0193ad59-5701-4859-a8de-d7f68d4e4bd6

Could this maybe decrease performance slightly?

ParkMyCar · 2024-12-11T16:52:55Z

@bkirwi I followed up on what appear to be the blocking issues, let's see what CI has to say! Planning to follow up on the remaining two comments but given my possibly limited time this week I wanted to get the blocking ones out of the way first

* use existing goodbytes method instead of byte_size * re-work backwards compatible migration to check if Field is nullable before returning a NullArray * add tests

* small fix after rebase and an API changed * add more docs to RelationDesc describing role of ColumnMetadata

bkirwi

Broadly LGTM, but we should switch the goodbytes impl to line up with the existing impls. (Generally I think you can just copy what the encoder does!)

bkirwi · 2025-01-03T21:21:54Z

src/repr/src/row.rs

        mz_ore::soft_assert_eq_or_log!(
-            col_idx,


A little more expensive in the new version, since it involves some decoding... unsure if it crosses the caring threshold.

(This strikes me as a less interesting assert post-refactoring... we're iterating over the desc instead of the input now, so it would be very hard to end up with a different number of cols.)

bkirwi · 2025-01-03T21:26:49Z

src/repr/src/row/encode.rs

+                vals,
+                nulls,
+            } => {
+                dim_offsets.inner().inner().len()


We don't include the offsets size in goodbytes calculations... see ArrayOrd or DatumEncoder for examples.

bkirwi · 2025-01-03T21:27:22Z

src/repr/src/row/encode.rs

+
+        decoders_size
+            + self
+                .nullability


Similarly, the nullability array isn't counted in goodbytes.

bkirwi · 2025-01-03T21:27:47Z

src/storage-types/src/sources.rs

+        }
+
+        // Read back all of our data a second time with a projection applied, and make sure the
+        // stats are valid.


Wise to test the stats!

bkirwi · 2025-01-03T21:29:16Z

src/persist-types/src/codec_impls.rs

@@ -94,6 +95,10 @@ impl ColumnDecoder<()> for UnitColumnar {
        }
    }

+    fn goodbytes(&self) -> usize {
+        std::mem::size_of::<UnitColumnar>()


This should be 0... see the ColumnEncoder impl.

bkirwi · 2025-01-03T21:30:05Z

src/repr/src/relation.rs

@@ -367,6 +367,21 @@ pub struct ColumnIndex(usize);

 static_assertions::assert_not_impl_all!(ColumnIndex: Arbitrary);

+impl ColumnIndex {


Mmm... and we do check that the cardinality is identical already. Does look safer the way you have it now in any case!

ParkMyCar force-pushed the persist/projection-pushdown branch 2 times, most recently from eebb11a to 3182591 Compare December 9, 2024 14:51

ParkMyCar marked this pull request as ready for review December 9, 2024 14:53

ParkMyCar requested review from a team as code owners December 9, 2024 14:53

ParkMyCar force-pushed the persist/projection-pushdown branch from 3182591 to a883b7d Compare December 9, 2024 21:13

ParkMyCar requested a review from a team as a code owner December 9, 2024 21:13

ParkMyCar force-pushed the persist/projection-pushdown branch from a883b7d to b69b477 Compare December 9, 2024 21:34

bkirwi reviewed Dec 9, 2024

View reviewed changes

ParkMyCar force-pushed the persist/projection-pushdown branch 4 times, most recently from 163d2ac to c004d3d Compare December 18, 2024 16:51

ParkMyCar added 8 commits January 3, 2025 10:34

start, mvp

ae46515

proptest projection

881433c

track metrics for the number of bytes pruned with pushdown

ae19811

change dyncfg to false by default, enable it in tests

6b5d806

rework the dyncfg to actually work

a406aa1

small fix after rebase

9c3a8d3

followup from GitHub feedback

9e025c9

* use existing goodbytes method instead of byte_size * re-work backwards compatible migration to check if Field is nullable before returning a NullArray * add tests

another round of changes

16c0a63

* small fix after rebase and an API changed * add more docs to RelationDesc describing role of ColumnMetadata

ParkMyCar force-pushed the persist/projection-pushdown branch from c004d3d to 16c0a63 Compare January 3, 2025 15:38

bkirwi reviewed Jan 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

persist/storage: communicate projection pushdown via `shard_source` #30764

persist/storage: communicate projection pushdown via `shard_source` #30764

ParkMyCar commented Dec 8, 2024 •

edited

Loading

shepherdlybot bot commented Dec 9, 2024 •

edited

Loading

ParkMyCar commented Dec 9, 2024

bkirwi Dec 9, 2024

ParkMyCar Dec 11, 2024

bkirwi Dec 9, 2024

bkirwi Dec 9, 2024 •

edited

Loading

ParkMyCar Dec 11, 2024

bkirwi Dec 9, 2024

ParkMyCar Dec 18, 2024

bkirwi Jan 3, 2025

bkirwi Dec 9, 2024

ParkMyCar Dec 18, 2024

def- commented Dec 10, 2024 •

edited

Loading

ParkMyCar commented Dec 11, 2024

bkirwi left a comment

bkirwi Jan 3, 2025

bkirwi Jan 3, 2025

bkirwi Jan 3, 2025

bkirwi Jan 3, 2025

bkirwi Jan 3, 2025

bkirwi Jan 3, 2025

		@@ -367,6 +367,21 @@ pub struct ColumnIndex(usize);

		static_assertions::assert_not_impl_all!(ColumnIndex: Arbitrary);

		impl ColumnIndex {

persist/storage: communicate projection pushdown via shard_source #30764

Are you sure you want to change the base?

persist/storage: communicate projection pushdown via shard_source #30764

Conversation

ParkMyCar commented Dec 8, 2024 • edited Loading

Motivation

Tips for reviewers

Checklist

shepherdlybot bot commented Dec 9, 2024 • edited Loading

Mitigations

ParkMyCar commented Dec 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bkirwi Dec 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

def- commented Dec 10, 2024 • edited Loading

ParkMyCar commented Dec 11, 2024

bkirwi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

persist/storage: communicate projection pushdown via `shard_source` #30764

persist/storage: communicate projection pushdown via `shard_source` #30764

ParkMyCar commented Dec 8, 2024 •

edited

Loading

shepherdlybot bot commented Dec 9, 2024 •

edited

Loading

bkirwi Dec 9, 2024 •

edited

Loading

def- commented Dec 10, 2024 •

edited

Loading