-
Notifications
You must be signed in to change notification settings - Fork 1.1k
[Variant] Support array shredding into List/LargeList/ListView/LargeListView
#8831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hi @scovich, you might be interested |
| enum ArrayVariantToArrowRowBuilder<'a> { | ||
| List(VariantToListArrowRowBuilder<'a, i32>), | ||
| LargeList(VariantToListArrowRowBuilder<'a, i64>), | ||
| ListView(VariantToListViewArrowRowBuilder<'a, i32>), | ||
| LargeListView(VariantToListViewArrowRowBuilder<'a, i64>), | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might be able to move this into variant_to_arrow, and variant_get could get the support out of the box. Given the size of this PR, I would do that as a followup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great!
Definitely interested, but a bit overbooked last week. Hoping I can get to it this week. |
# Conflicts: # parquet-variant-compute/src/shred_variant.rs
| match value { | ||
| Variant::List(list) => { | ||
| self.value_builder.append_null(); | ||
| self.typed_value_builder.append_value(list)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double checking -- if I try to shred as List<i32> and I encounter a variant array [..., "hi", ...], the bad entry will either become NULL or cause an error, depending on cast options?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the hi will be located in typed_value.value
| List(VariantToListArrowRowBuilder<'a, i32>), | ||
| LargeList(VariantToListArrowRowBuilder<'a, i64>), | ||
| ListView(VariantToListViewArrowRowBuilder<'a, i32>), | ||
| LargeListView(VariantToListViewArrowRowBuilder<'a, i64>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we introduce a ListLikeArrayBuilder trait (**) that encapsulates the (minimal) differences between these four types, so that ArrayVariantToArrowRowBuilder becomes a generic struct instead of an enum?
(**) c.f. StringLikeArrayBuilder that serves the same purpose for strings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick analysis suggests the trait needs:
- An associated type:
type Offset: OffsetSizeTrait - A constructor:
fn try_new(...) -> Result<Self> - Helper functions to support
append_nullandappend_value(nulls, offsets, etc) - A finisher:
fn finish(self) -> Result<ArrayRef>
Two trait implementations (one for lists and one for list views), both generic over Offset
And from there, the outer builder should be able to implement its own logic just once instead of four times.
Double check tho -- the above is a very rough sketch. The goal is to minimize boilerplate and duplication, using a careful selection of trait methods that capture the essential differences between lists and list views.
| match value { | ||
| Variant::List(list) => { | ||
| self.value_builder.append_null(); | ||
| self.typed_value_builder.append_value(list)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems the hi will be located in typed_value.value
| Ok(()) | ||
| } | ||
|
|
||
| fn append_value(&mut self, value: Variant<'_, '_>) -> Result<bool> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to change the parameter value to something else, currently, there will be two value in line 286, and this may lead some confusion.
| } | ||
|
|
||
| #[test] | ||
| fn test_array_shredding_as_list() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice tests!
Not sure if we need to extract some common logic for these tests. They share some of the same logic.
Which issue does this PR close?
List/LargeList/ListView/LargeListView#8830.Rationale for this change
What changes are included in this PR?
Implement array shredding into
List/LargeList/ListView/LargeListViewto close the gaps inshred_variant. Part of the changes lay the groundwork for addingvariant_getsupport for list types in a follow-up.Are these changes tested?
Yes
Are there any user-facing changes?
New shredding types supported