Skip to content

[Variant] API to construct Shredded Variant Arrays #7895

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

As we begin to contemplate how to read and write shredded variants, we will need some way to construct arrow arrays that contain shredded variants

Physically these will be Arrow StructArrays with two or three fields

  • Non shredded: (2 fields) STRUCT { "metadata": Binary, "value": Binary}
  • Shredded: (3 fields)STRUCT { "metadata": Binary, "value": Binary, typed_value: STRUCT { ... } }

More information on to represent Variants as Arrow arrays can be found on the proposal:

Describe the solution you'd like

I would like some way to construct such shredded arrays easily and efficiently in Idomatic Rust style

Describe alternatives you've considered

One an idea from @zeroshade (thank you!) is to create a VariantArrayBuilder that is responsible for building the correct StructArrays from variants, including shredding out any columns. In order to created a shredded output, you would provide the shredded schema up front

For example, (based on the go implemntation and @scovich 's comment here), to create a shredded Arrow array that shreds out columns "foo" and "bar" from any variant objects,

We would need this schema:

STRUCT {
  metadata: BinaryView,
  value: BinaryView,
  typed_value: STRUCT {
    foo: Int64,
    bar: Int32
  }
}

The code would look like this

// Create an arrow Field that describes the desired shredded output schema
let shredded_schema = Field::new_struct(
    vec![ "metadata", "value", "typed_value"],
    vec![Field::new(DataType::BinaryView), Field::new(DataType::BinaryView), Field:::new_struct(
        vec!["foo", "bar"],
        vec![Field::new(DataType::Int64), Field::new(DataType::Int32)],
	));

// Create a builder for an array (batch) of Variant values
let array_builder = VariantArrayBuilder::new(shredded_schema);

// append a row to the builder
let object= array_builder.new_object();
... add appropriate fields ...
// use like normal ObjectBuilder(??)
object.finish()

// append a second row (has no foo or bar fields)
array_builder.append_value(43);
...

/// Finalze the builder
let variant_array: StructArray = array_builder.build()?;
// variant_array is a shreded variant

I think a VariantArrayBuilder will be helpful for usecases other than Variant, and @harshmotw-db has created some version of one here:

Prior Art

Golang implementation:

Additional context

Metadata

Metadata

Labels

enhancementAny new improvement worthy of a entry in the changelog

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions