-
Notifications
You must be signed in to change notification settings - Fork 464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
persist/storage: communicate projection pushdown via shard_source
#30764
base: main
Are you sure you want to change the base?
persist/storage: communicate projection pushdown via shard_source
#30764
Conversation
eebb11a
to
3182591
Compare
MitigationsCompleting required mitigations increases Resilience Coverage.
Risk Summary:This pull request has a high risk score of 81, driven by predictors such as "Sum Bug Reports Of Files" and "Delta of Executable Lines." Historically, PRs with these predictors are 114% more likely to cause a bug than the repository baseline. Although the repository's observed and predicted bug trends are decreasing, the presence of 2 file hotspots adds to the concern. Note: The risk score is not based on semantic analysis but on historical predictors of bug occurrence in the repository. The attributes above were deemed the strongest predictors based on that history. Predictors and the score may change as the PR evolves in code, time, and review activity. Bug Hotspots:
|
Btw I ran the |
3182591
to
a883b7d
Compare
a883b7d
to
b69b477
Compare
src/persist-client/src/fetch.rs
Outdated
decode::<V>("val", &*both.val, &structured.val), | ||
), | ||
PartMigration::SameSchema { both } => { | ||
let key_size_before = structured.key.get_array_memory_size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I'd be tempted to use the goodbytes
accounting of memory for this, since that's how we count memory elsewhere in the structured-data code. (Including on the encoder!)
You could use the ArrayOrd
implementation here. I think we'd probably still need new code for the Decoder
impl, though it could follow the encoder's impl pretty closely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call! Refactored to use ArrayOrd
and goodbytes
everywhere
src/persist-types/src/schema.rs
Outdated
@@ -98,6 +98,8 @@ pub(crate) enum ArrayMigration { | |||
NoOp, | |||
Struct(Vec<StructArrayMigration>), | |||
List(FieldRef, Box<ArrayMigration>), | |||
/// Replace the array with a [`NullArray`]. | |||
ReplaceWithNull, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems a bit sketchy in general... when does it come up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically since not all struct arrays are nullable, it seems like it could be a backwards-incompatible change unless we're protected from that in some other way...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted offline but this came up when projecting away all of the fields in a StructArray
. Removed this and added logic to use the StructArray::new_empty_fields
constructor instead!
@@ -367,6 +367,21 @@ pub struct ColumnIndex(usize); | |||
|
|||
static_assertions::assert_not_impl_all!(ColumnIndex: Arbitrary); | |||
|
|||
impl ColumnIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semi-related, but I'd love if there was some documentation on the RelationDesc
level that said what ColumnMetadata
was about, and how it relates to the RelationType
.
For example: RelationDesc::iter
zips together all the names from the metadata
with all the types from the typ
, but my understanding is that they're not meant to be 1:1. Seems like it could be a latent bug? But in either case some docs would make me more confident...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout, I added some docs the describe the relationship between RelationDesc
, ColumnMetadata
, and RelationType
, let me know if it makes sense or requires more detail!
I'm pretty sure RelationDesc::iter
is still correct since the order of columns in the RelationType
and metadata
will be the same, but that's just by chance. I updated the impl to read from metadata
to determine the index we should read from in the RelationType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm... and we do check that the cardinality is identical already. Does look safer the way you have it now in any case!
.collect(); | ||
ops.permute(remap, new_arity); | ||
read_schema = Some(new_desc); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, some things I think might make this a little cleaner:
- The
demand
+permute
could be a new helper on the MFP itself...pre_project
or something? May help clarify intent. - This logic could live inside the
persist_source
itself... though I guess you'd need to move theMfpPlan::create_from
in there too. My feeling is that (rollout aside) you'd never not want to apply this projection, so better to not make it the caller's responisibility.
Neither suggestion blocking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around with adding a demand
+ permute
helper on the MFP itself, but couldn't get something that I really liked. Mainly because applying the demands on a RelationDesc
returns a new RelationDesc
instead of modifying in-place, which I think requires the helper to be able to return the new RelationDesc
. It could also be a case of morning brain, if you have something specific you're thinking of here let me know, happy to circle back on it!
I haven't seen this nightly timeout before: https://buildkite.com/materialize/nightly/builds/10655#0193ad59-5750-4d7f-8b2b-028341c71a11 There was also a timeout in product limits: https://buildkite.com/materialize/nightly/builds/10655#0193ad59-5701-4859-a8de-d7f68d4e4bd6 Could this maybe decrease performance slightly? |
@bkirwi I followed up on what appear to be the blocking issues, let's see what CI has to say! Planning to follow up on the remaining two comments but given my possibly limited time this week I wanted to get the blocking ones out of the way first |
163d2ac
to
c004d3d
Compare
* use existing goodbytes method instead of byte_size * re-work backwards compatible migration to check if Field is nullable before returning a NullArray * add tests
* small fix after rebase and an API changed * add more docs to RelationDesc describing role of ColumnMetadata
c004d3d
to
16c0a63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly LGTM, but we should switch the goodbytes
impl to line up with the existing impls. (Generally I think you can just copy what the encoder does!)
mz_ore::soft_assert_eq_or_log!( | ||
col_idx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little more expensive in the new version, since it involves some decoding... unsure if it crosses the caring threshold.
(This strikes me as a less interesting assert post-refactoring... we're iterating over the desc instead of the input now, so it would be very hard to end up with a different number of cols.)
vals, | ||
nulls, | ||
} => { | ||
dim_offsets.inner().inner().len() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't include the offsets size in goodbytes
calculations... see ArrayOrd
or DatumEncoder
for examples.
|
||
decoders_size | ||
+ self | ||
.nullability |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, the nullability array isn't counted in goodbytes
.
} | ||
|
||
// Read back all of our data a second time with a projection applied, and make sure the | ||
// stats are valid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wise to test the stats!
@@ -94,6 +95,10 @@ impl ColumnDecoder<()> for UnitColumnar { | |||
} | |||
} | |||
|
|||
fn goodbytes(&self) -> usize { | |||
std::mem::size_of::<UnitColumnar>() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be 0... see the ColumnEncoder
impl.
@@ -367,6 +367,21 @@ pub struct ColumnIndex(usize); | |||
|
|||
static_assertions::assert_not_impl_all!(ColumnIndex: Arbitrary); | |||
|
|||
impl ColumnIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm... and we do check that the cardinality is identical already. Does look safer the way you have it now in any case!
Continuation of #29539
Implements a limited form of projection pushdown by applying the
demands
of aMapFilterProject
onto theRelationDesc
of a collection, and using the resultingRelationDesc
inshard_source
. All of the columns will still be fetched from S3, but when decoding structured data we'll drop the unneeded columns, and decodingProtoRow
data we'll skip the unneededDatum
s.Motivation
Progress towards: https://github.com/MaterializeInc/database-issues/issues/8402
Fixes https://github.com/MaterializeInc/database-issues/issues/8825
Tips for reviewers
The implementation for the feature is in the first commit, and can be reviewed on its own. The later commits add testing and metrics and theoretically could be their own PRs, so it might be nice to review one commit at a time.
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.