Skip to content

Conversation

@liamzwbao
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Implement array shredding into List/LargeList/ListView/LargeListView to close the gaps in shred_variant. Part of the changes lay the groundwork for adding variant_get support for list types in a follow-up.

Are these changes tested?

Yes

Are there any user-facing changes?

New shredding types supported

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Nov 12, 2025
@liamzwbao liamzwbao marked this pull request as ready for review November 12, 2025 23:53
@liamzwbao
Copy link
Contributor Author

Hi @scovich, you might be interested

Comment on lines +286 to +291
enum ArrayVariantToArrowRowBuilder<'a> {
List(VariantToListArrowRowBuilder<'a, i32>),
LargeList(VariantToListArrowRowBuilder<'a, i64>),
ListView(VariantToListViewArrowRowBuilder<'a, i32>),
LargeListView(VariantToListViewArrowRowBuilder<'a, i64>),
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to move this into variant_to_arrow, and variant_get could get the support out of the box. Given the size of this PR, I would do that as a followup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great!

@scovich
Copy link
Contributor

scovich commented Nov 17, 2025

Hi @scovich, you might be interested

Definitely interested, but a bit overbooked last week. Hoping I can get to it this week.

# Conflicts:
#	parquet-variant-compute/src/shred_variant.rs
match value {
Variant::List(list) => {
self.value_builder.append_null();
self.typed_value_builder.append_value(list)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double checking -- if I try to shred as List<i32> and I encounter a variant array [..., "hi", ...], the bad entry will either become NULL or cause an error, depending on cast options?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the hi will be located in typed_value.value

Comment on lines +313 to +316
List(VariantToListArrowRowBuilder<'a, i32>),
LargeList(VariantToListArrowRowBuilder<'a, i64>),
ListView(VariantToListViewArrowRowBuilder<'a, i32>),
LargeListView(VariantToListViewArrowRowBuilder<'a, i64>),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we introduce a ListLikeArrayBuilder trait (**) that encapsulates the (minimal) differences between these four types, so that ArrayVariantToArrowRowBuilder becomes a generic struct instead of an enum?

(**) c.f. StringLikeArrayBuilder that serves the same purpose for strings

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick analysis suggests the trait needs:

  • An associated type: type Offset: OffsetSizeTrait
  • A constructor: fn try_new(...) -> Result<Self>
  • Helper functions to support append_null and append_value (nulls, offsets, etc)
  • A finisher: fn finish(self) -> Result<ArrayRef>

Two trait implementations (one for lists and one for list views), both generic over Offset

And from there, the outer builder should be able to implement its own logic just once instead of four times.

Double check tho -- the above is a very rough sketch. The goal is to minimize boilerplate and duplication, using a careful selection of trait methods that capture the essential differences between lists and list views.

match value {
Variant::List(list) => {
self.value_builder.append_null();
self.typed_value_builder.append_value(list)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems the hi will be located in typed_value.value

Ok(())
}

fn append_value(&mut self, value: Variant<'_, '_>) -> Result<bool> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change the parameter value to something else, currently, there will be two value in line 286, and this may lead some confusion.

}

#[test]
fn test_array_shredding_as_list() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests!
Not sure if we need to extract some common logic for these tests. They share some of the same logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet-variant parquet-variant* crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Variant] Support array shredding into List/LargeList/ListView/LargeListView

4 participants