Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

persist/storage: communicate projection pushdown via shard_source #30764

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

ParkMyCar
Copy link
Member

@ParkMyCar ParkMyCar commented Dec 8, 2024

Continuation of #29539

Implements a limited form of projection pushdown by applying the demands of a MapFilterProject onto the RelationDesc of a collection, and using the resulting RelationDesc in shard_source. All of the columns will still be fetched from S3, but when decoding structured data we'll drop the unneeded columns, and decoding ProtoRow data we'll skip the unneeded Datums.

Motivation

Progress towards: https://github.com/MaterializeInc/database-issues/issues/8402

Fixes https://github.com/MaterializeInc/database-issues/issues/8825

Tips for reviewers

The implementation for the feature is in the first commit, and can be reviewed on its own. The later commits add testing and metrics and theoretically could be their own PRs, so it might be nice to review one commit at a time.

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ParkMyCar ParkMyCar force-pushed the persist/projection-pushdown branch 2 times, most recently from eebb11a to 3182591 Compare December 9, 2024 14:51
@ParkMyCar ParkMyCar marked this pull request as ready for review December 9, 2024 14:53
@ParkMyCar ParkMyCar requested review from a team as code owners December 9, 2024 14:53
Copy link

shepherdlybot bot commented Dec 9, 2024

Risk Score:81 / 100 Bug Hotspots:2 Resilience Coverage:33%

Mitigations

Completing required mitigations increases Resilience Coverage.

  • (Required) Code Review 🔍 Detected
  • (Required) Feature Flag
  • (Required) Integration Test
  • (Required) Observability
  • (Required) QA Review 🔍 Detected
  • (Required) Run Nightly Tests
  • Unit Test 🔍 Detected
Risk Summary:

This pull request has a high risk score of 81, driven by predictors such as "Sum Bug Reports Of Files" and "Delta of Executable Lines." Historically, PRs with these predictors are 114% more likely to cause a bug than the repository baseline. Although the repository's observed and predicted bug trends are decreasing, the presence of 2 file hotspots adds to the concern.

Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity.

Bug Hotspots:
What's This?

File Percentile
../src/schema.rs 97
../sink/materialized_view_v2.rs 98

@ParkMyCar
Copy link
Member Author

Btw I ran the all_source_data_roundtrips proptest for ~48 hours after adding the new changes for projection pushdown and checking stats.

@ParkMyCar ParkMyCar force-pushed the persist/projection-pushdown branch from 3182591 to a883b7d Compare December 9, 2024 21:13
@ParkMyCar ParkMyCar requested a review from a team as a code owner December 9, 2024 21:13
@ParkMyCar ParkMyCar force-pushed the persist/projection-pushdown branch from a883b7d to b69b477 Compare December 9, 2024 21:34
decode::<V>("val", &*both.val, &structured.val),
),
PartMigration::SameSchema { both } => {
let key_size_before = structured.key.get_array_memory_size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I'd be tempted to use the goodbytes accounting of memory for this, since that's how we count memory elsewhere in the structured-data code. (Including on the encoder!)

You could use the ArrayOrd implementation here. I think we'd probably still need new code for the Decoder impl, though it could follow the encoder's impl pretty closely.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call! Refactored to use ArrayOrd and goodbytes everywhere

@@ -98,6 +98,8 @@ pub(crate) enum ArrayMigration {
NoOp,
Struct(Vec<StructArrayMigration>),
List(FieldRef, Box<ArrayMigration>),
/// Replace the array with a [`NullArray`].
ReplaceWithNull,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems a bit sketchy in general... when does it come up?

Copy link
Contributor

@bkirwi bkirwi Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically since not all struct arrays are nullable, it seems like it could be a backwards-incompatible change unless we're protected from that in some other way...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted offline but this came up when projecting away all of the fields in a StructArray. Removed this and added logic to use the StructArray::new_empty_fields constructor instead!

@@ -367,6 +367,21 @@ pub struct ColumnIndex(usize);

static_assertions::assert_not_impl_all!(ColumnIndex: Arbitrary);

impl ColumnIndex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Semi-related, but I'd love if there was some documentation on the RelationDesc level that said what ColumnMetadata was about, and how it relates to the RelationType.

For example: RelationDesc::iter zips together all the names from the metadata with all the types from the typ, but my understanding is that they're not meant to be 1:1. Seems like it could be a latent bug? But in either case some docs would make me more confident...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good callout, I added some docs the describe the relationship between RelationDesc, ColumnMetadata, and RelationType, let me know if it makes sense or requires more detail!

I'm pretty sure RelationDesc::iter is still correct since the order of columns in the RelationType and metadata will be the same, but that's just by chance. I updated the impl to read from metadata to determine the index we should read from in the RelationType.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... and we do check that the cardinality is identical already. Does look safer the way you have it now in any case!

.collect();
ops.permute(remap, new_arity);
read_schema = Some(new_desc);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, some things I think might make this a little cleaner:

  • The demand + permute could be a new helper on the MFP itself... pre_project or something? May help clarify intent.
  • This logic could live inside the persist_source itself... though I guess you'd need to move the MfpPlan::create_from in there too. My feeling is that (rollout aside) you'd never not want to apply this projection, so better to not make it the caller's responisibility.

Neither suggestion blocking!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I played around with adding a demand + permute helper on the MFP itself, but couldn't get something that I really liked. Mainly because applying the demands on a RelationDesc returns a new RelationDesc instead of modifying in-place, which I think requires the helper to be able to return the new RelationDesc. It could also be a case of morning brain, if you have something specific you're thinking of here let me know, happy to circle back on it!

@def-
Copy link
Contributor

def- commented Dec 10, 2024

I haven't seen this nightly timeout before: https://buildkite.com/materialize/nightly/builds/10655#0193ad59-5750-4d7f-8b2b-028341c71a11
Got stuck for an hour in platform-checks, not sure if related to this PR or not.

There was also a timeout in product limits: https://buildkite.com/materialize/nightly/builds/10655#0193ad59-5701-4859-a8de-d7f68d4e4bd6

Could this maybe decrease performance slightly?

@ParkMyCar
Copy link
Member Author

@bkirwi I followed up on what appear to be the blocking issues, let's see what CI has to say! Planning to follow up on the remaining two comments but given my possibly limited time this week I wanted to get the blocking ones out of the way first

@ParkMyCar ParkMyCar force-pushed the persist/projection-pushdown branch 4 times, most recently from 163d2ac to c004d3d Compare December 18, 2024 16:51
* use existing goodbytes method instead of byte_size
* re-work backwards compatible migration to check if Field is nullable before returning a NullArray
* add tests
* small fix after rebase and an API changed
* add more docs to RelationDesc describing role of ColumnMetadata
@ParkMyCar ParkMyCar force-pushed the persist/projection-pushdown branch from c004d3d to 16c0a63 Compare January 3, 2025 15:38
Copy link
Contributor

@bkirwi bkirwi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly LGTM, but we should switch the goodbytes impl to line up with the existing impls. (Generally I think you can just copy what the encoder does!)

mz_ore::soft_assert_eq_or_log!(
col_idx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little more expensive in the new version, since it involves some decoding... unsure if it crosses the caring threshold.

(This strikes me as a less interesting assert post-refactoring... we're iterating over the desc instead of the input now, so it would be very hard to end up with a different number of cols.)

vals,
nulls,
} => {
dim_offsets.inner().inner().len()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't include the offsets size in goodbytes calculations... see ArrayOrd or DatumEncoder for examples.


decoders_size
+ self
.nullability
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, the nullability array isn't counted in goodbytes.

}

// Read back all of our data a second time with a projection applied, and make sure the
// stats are valid.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wise to test the stats!

@@ -94,6 +95,10 @@ impl ColumnDecoder<()> for UnitColumnar {
}
}

fn goodbytes(&self) -> usize {
std::mem::size_of::<UnitColumnar>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 0... see the ColumnEncoder impl.

@@ -367,6 +367,21 @@ pub struct ColumnIndex(usize);

static_assertions::assert_not_impl_all!(ColumnIndex: Arbitrary);

impl ColumnIndex {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm... and we do check that the cardinality is identical already. Does look safer the way you have it now in any case!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants