diff --git a/docs/guides/rust-cookbook.md b/docs/guides/rust-cookbook.md new file mode 100644 index 00000000000..0c818debabf --- /dev/null +++ b/docs/guides/rust-cookbook.md @@ -0,0 +1,568 @@ +# Rust Cookbook + +This guide provides practical, copy-pasteable examples for common operations in Vortex. For more in-depth information, see the [Rust API documentation](https://docs.rs/vortex). + +## Topics Covered + +✓ **Creating arrays** - Different types including VarBinArray for strings ([Creating Arrays](#creating-arrays)) +✓ **Printing/debugging** - Display methods and inspection ([Inspecting and Debugging](#inspecting-and-debugging-arrays)) +✓ **Iterating** - Index-based and `with_iterator` patterns ([Iterating Over Arrays](#iterating-over-arrays)) +✓ **Accessing elements** - Getting individual values ([Accessing Elements](#accessing-elements)) +✓ **Modifying arrays** - Arrays are immutable, create new ones ([Array Immutability](#array-immutability)) +✓ **File I/O** - Reading and writing files ([File I/O](#file-io)) +✓ **VarBinArray vs VarBinViewArray** - Comparison and when to use each ([String Arrays](#string-arrays)) +✓ **Array trait vs concrete types** - Understanding the type system ([Core Concepts](#core-concepts)) + +## Quick Reference + +### Array Creation + +| Type | Code Example | +|------|--------------| +| Primitive integers | `buffer![1i32, 2, 3, 4, 5].into_array()` | +| Primitive floats | `buffer![1.0f64, 2.5, 3.14].into_array()` | +| Strings | `VarBinArray::from(vec!["hello", "world"]).into_array()` | +| Boolean | `BoolArray::from(vec![true, false, true]).into_array()` | +| Null array | `NullArray::new(5)` | +| Struct | `StructArray::from_fields(&[("name", array1), ("age", array2)])` | + +### Common Operations + +| Operation | Code | +|-----------|------| +| Get element | `array.scalar_at(index)` | +| Get length | `array.len()` | +| Check validity | `array.is_valid(index)` | +| Slice array | `array.slice(start..end)` | +| Print values | `array.display_values()` | +| Show structure | `array.display_tree()` | +| Get dtype | `array.dtype()` | +| Get encoding | `array.encoding().id()` | + +## Core Concepts + +### Array Trait vs Concrete Types + +Understanding the difference between the `Array` trait and concrete array types like `VarBinArray` is fundamental to using Vortex effectively. + +#### The Array Trait + +`Array` is the core **trait** (interface) that defines what all array types can do: + +```rust +pub trait Array: Send + Sync + Debug { + fn len(&self) -> usize; + fn dtype(&self) -> &DType; + fn scalar_at(&self, index: usize) -> VortexResult; + // ... many other methods +} +``` + +**Key points:** +- Defines the common interface for all array types +- Provides methods like `len()`, `dtype()`, `scalar_at()`, `slice()` +- Enables polymorphism - write functions that work with any array type +- Usually used as `ArrayRef = Arc` for type erasure + +#### Concrete Array Types + +Concrete types like `VarBinArray`, `PrimitiveArray`, `BoolArray` are specific implementations of the `Array` trait: + +```rust +// Type hierarchy +Array (trait) + ├── PrimitiveArray // for numbers: i32, f64, etc. + ├── BoolArray // for booleans + ├── VarBinArray // for variable-length strings/binary + ├── VarBinViewArray // alternative string encoding + ├── StructArray // for struct/record data + └── ... many more encodings +``` + +#### Practical Example + +.. literalinclude:: ../../vortex/examples/core_concepts.rs + :language: rust + :dedent: + :start-after: [array-trait-vs-concrete] + :end-before: [array-trait-vs-concrete] + +#### Why This Design? + +1. **Polymorphism**: Write generic functions + +.. literalinclude:: ../../vortex/examples/core_concepts.rs + :language: rust + :dedent: + :start-after: [polymorphism] + :end-before: [polymorphism] + +2. **Multiple encodings for same data**: + +.. literalinclude:: ../../vortex/examples/core_concepts.rs + :language: rust + :dedent: + :start-after: [multiple-encodings] + :end-before: [multiple-encodings] + +3. **Heterogeneous collections**: + +.. literalinclude:: ../../vortex/examples/core_concepts.rs + :language: rust + :dedent: + :start-after: [heterogeneous-collections] + :end-before: [heterogeneous-collections] + +**Full example:** [core_concepts.rs](../../vortex/examples/core_concepts.rs) + +## Creating Arrays + +### Primitive Arrays + +Create arrays of integers, floats, and other primitive types: + +.. literalinclude:: ../../vortex/examples/basic_array_creation.rs + :language: rust + :dedent: + :start-after: [primitive-int] + :end-before: [primitive-int] + +For arrays with null values: + +.. literalinclude:: ../../vortex/examples/basic_array_creation.rs + :language: rust + :dedent: + :start-after: [primitive-with-validity] + :end-before: [primitive-with-validity] + +**Full example:** [basic_array_creation.rs](../../vortex/examples/basic_array_creation.rs) + +### String Arrays + +Vortex has two encodings for variable-length strings: + +#### VarBinArray vs VarBinViewArray + +| Aspect | VarBinArray | VarBinViewArray | +|--------|-------------|-----------------| +| **Encoding** | Offset-based (like Arrow StringArray) | View-based (like Arrow StringViewArray) | +| **Structure** | Single data buffer + offsets | Multiple buffers + views | +| **Memory** | More compact for small strings | Better for mixed-size strings | +| **Operations** | Good for sequential access | Better for slicing, concatenation | +| **Canonical** | No | Yes (canonical for Utf8 dtype) | +| **When to use** | Input/output, small uniform strings | Processing, frequent slicing | + +.. literalinclude:: ../../vortex/examples/string_arrays.rs + :language: rust + :dedent: + :start-after: [array-vs-view] + :end-before: [array-vs-view] + +.. literalinclude:: ../../vortex/examples/string_arrays.rs + :language: rust + :dedent: + :start-after: [varbin-from-vec] + :end-before: [varbin-from-vec] + +With null values: + +.. literalinclude:: ../../vortex/examples/string_arrays.rs + :language: rust + :dedent: + :start-after: [varbin-from-iter] + :end-before: [varbin-from-iter] + +**Full example:** [string_arrays.rs](../../vortex/examples/string_arrays.rs) + +### Struct Arrays + +Structs group multiple fields together: + +.. literalinclude:: ../../vortex/examples/struct_arrays.rs + :language: rust + :dedent: + :start-after: [struct-from-fields] + :end-before: [struct-from-fields] + +**Full example:** [struct_arrays.rs](../../vortex/examples/struct_arrays.rs) + +### Advanced Array Types + +Vortex supports many specialized array types beyond the basics: + +#### Constant Arrays + +Efficiently represent arrays where all values are the same: + +.. literalinclude:: ../../vortex/examples/advanced_array_types.rs + :language: rust + :dedent: + :start-after: [constant-array] + :end-before: [constant-array] + +#### List Arrays + +Variable-length nested arrays (like `Vec>`): + +.. literalinclude:: ../../vortex/examples/advanced_array_types.rs + :language: rust + :dedent: + :start-after: [list-array] + :end-before: [list-array] + +#### Fixed-Size List Arrays + +Arrays where all lists have the same length: + +.. literalinclude:: ../../vortex/examples/advanced_array_types.rs + :language: rust + :dedent: + :start-after: [fixed-size-list] + :end-before: [fixed-size-list] + +#### DateTime Arrays + +Temporal data with timezone support: + +.. literalinclude:: ../../vortex/examples/advanced_array_types.rs + :language: rust + :dedent: + :start-after: [datetime-array] + :end-before: [datetime-array] + +#### Decimal Arrays + +Fixed-precision decimal numbers: + +.. literalinclude:: ../../vortex/examples/advanced_array_types.rs + :language: rust + :dedent: + :start-after: [decimal-array] + :end-before: [decimal-array] + +#### Extension Arrays + +User-defined custom types with metadata: + +.. literalinclude:: ../../vortex/examples/advanced_array_types.rs + :language: rust + :dedent: + :start-after: [extension-array] + :end-before: [extension-array] + +**Full example:** [advanced_array_types.rs](../../vortex/examples/advanced_array_types.rs) + +## Inspecting and Debugging Arrays + +### Display Methods + +Vortex provides several ways to print arrays: + +.. literalinclude:: ../../vortex/examples/debug_printing.rs + :language: rust + :dedent: + :start-after: [default-display] + :end-before: [default-display] + +.. literalinclude:: ../../vortex/examples/debug_printing.rs + :language: rust + :dedent: + :start-after: [display-values] + :end-before: [display-values] + +To see the internal encoding structure: + +.. literalinclude:: ../../vortex/examples/debug_printing.rs + :language: rust + :dedent: + :start-after: [display-tree] + :end-before: [display-tree] + +**Full example:** [debug_printing.rs](../../vortex/examples/debug_printing.rs) + +### Array Properties + +Inspect array metadata: + +.. literalinclude:: ../../vortex/examples/debug_printing.rs + :language: rust + :dedent: + :start-after: [inspect-properties] + :end-before: [inspect-properties] + +## Accessing Elements + +### Getting Individual Elements + +Use `scalar_at(index)` to get elements: + +.. literalinclude:: ../../vortex/examples/array_iteration.rs + :language: rust + :dedent: + :start-after: [scalar-at] + :end-before: [scalar-at] + +### Extracting Typed Values + +Convert scalars to Rust types: + +.. literalinclude:: ../../vortex/examples/array_iteration.rs + :language: rust + :dedent: + :start-after: [typed-values] + :end-before: [typed-values] + +### Handling Null Values + +Check validity before accessing values: + +.. literalinclude:: ../../vortex/examples/array_iteration.rs + :language: rust + :dedent: + :start-after: [iterate-with-validity] + :end-before: [iterate-with-validity] + +**Full example:** [array_iteration.rs](../../vortex/examples/array_iteration.rs) + +## Iterating Over Arrays + +### Index-based Iteration + +The simplest way to iterate over any array: + +.. literalinclude:: ../../vortex/examples/array_iteration.rs + :language: rust + :dedent: + :start-after: [scalar-at] + :end-before: [scalar-at] + +### Iterating String Arrays with ArrayAccessor + +VarBinArray and VarBinViewArray implement `ArrayAccessor` for efficient iteration: + +.. literalinclude:: ../../vortex/examples/array_iteration.rs + :language: rust + :dedent: + :start-after: [array-accessor] + :end-before: [array-accessor] + +### Chunk-based Iteration + +For ChunkedArrays, use `to_array_iterator()`: + +.. literalinclude:: ../../vortex/examples/array_iteration.rs + :language: rust + :dedent: + :start-after: [array-iterator] + :end-before: [array-iterator] + +**Full example:** [array_iteration.rs](../../vortex/examples/array_iteration.rs) + +## Slicing and Transforming + +### Array Immutability + +**Important:** Vortex arrays are **immutable**. You cannot modify an existing array. Instead, you create new arrays: + +.. literalinclude:: ../../vortex/examples/array_immutability.rs + :language: rust + :dedent: + :start-after: [immutability-concept] + :end-before: [immutability-concept] + +#### Creating Modified Arrays + +.. literalinclude:: ../../vortex/examples/array_immutability.rs + :language: rust + :dedent: + :start-after: [modify-with-builder] + :end-before: [modify-with-builder] + +#### Compute Operations Return New Arrays + +.. literalinclude:: ../../vortex/examples/array_immutability.rs + :language: rust + :dedent: + :start-after: [compute-returns-new] + :end-before: [compute-returns-new] + +**Full example:** [array_immutability.rs](../../vortex/examples/array_immutability.rs) + +### Slicing Arrays + +Slicing is O(1) and doesn't copy data: + +.. literalinclude:: ../../vortex/examples/array_iteration.rs + :language: rust + :dedent: + :start-after: [slice-array] + :end-before: [slice-array] + +### Building New Arrays + +Arrays are immutable. To create modified versions, use builders: + +.. literalinclude:: ../../vortex/examples/array_iteration.rs + :language: rust + :dedent: + :start-after: [modify-note] + :end-before: [modify-note] + +## File I/O + +### Writing Files + +Write arrays to disk with compression: + +.. literalinclude:: ../../vortex/examples/file_io.rs + :language: rust + :dedent: + :start-after: [basic-write] + :end-before: [basic-write] + +With custom compression: + +.. literalinclude:: ../../vortex/examples/file_io.rs + :language: rust + :dedent: + :start-after: [compressed-write] + :end-before: [compressed-write] + +### Reading Files + +Read entire files: + +.. literalinclude:: ../../vortex/examples/file_io.rs + :language: rust + :dedent: + :start-after: [basic-read] + :end-before: [basic-read] + +With filtering (pushdown): + +.. literalinclude:: ../../vortex/examples/file_io.rs + :language: rust + :dedent: + :start-after: [filtered-read] + :end-before: [filtered-read] + +**Full example:** [file_io.rs](../../vortex/examples/file_io.rs) + +## Key Concepts + +### Encodings vs Data Types + +- **DType** is the *logical* type (what the data represents) +- **Encoding** is the *physical* layout (how it's stored) + +For example, a `DType::Primitive(i32)` array could be stored in many encodings: +- `PrimitiveEncoding`: Uncompressed array +- `DictEncoding`: Dictionary encoding for repeated values +- `FastLanesEncoding`: Compressed with FastLanes bitpacking + +### Canonical Encodings + +Each dtype has a canonical encoding that supports zero-copy conversion to/from Arrow: + +| DType | Canonical Encoding | +|-------|-------------------| +| Bool | BoolArray | +| Primitive | PrimitiveArray | +| Utf8/Binary | VarBinViewArray | +| Struct | StructArray | +| List | ListArray | + +Use `to_canonical()` to convert any array to its canonical form. + +### Array vs ArrayView + +Some types have both Array and View variants: + +- **VarBinArray**: Owned array with offsets +- **VarBinViewArray**: Views into buffers (canonical for strings) + +The View variant is generally more efficient for operations like slicing and concatenation. + +### Validity + +Arrays can have nullable values. The `Validity` enum specifies: + +- `Validity::NonNullable`: Array has no nulls +- `Validity::Array(bool_array)`: Boolean mask indicating which values are valid +- `Validity::AllValid`: All values are valid (but type is nullable) + +## Common Patterns + +### Converting from Arrow + +```rust +use arrow_array::RecordBatch; +use vortex::Array; +use vortex::dtype::DType; +use vortex::dtype::arrow::FromArrowType; +use vortex_array::arrow::FromArrowArray; + +let arrow_batch: RecordBatch = ...; +let dtype = DType::from_arrow(arrow_batch.schema()); +let vortex_array = ArrayRef::from_arrow(arrow_batch, false); +``` + +### Converting to Arrow + +```rust +use vortex::ToCanonical; + +let vortex_array: ArrayRef = ...; +let canonical = vortex_array.to_canonical()?; +let arrow_array = canonical.into_arrow()?; +``` + +### Compressing Arrays + +```rust +use vortex::compressor::BtrBlocksCompressor; + +let array: ArrayRef = ...; +let compressed = BtrBlocksCompressor::default().compress(&array)?; +println!("Compression: {:.2}x", array.nbytes() as f64 / compressed.nbytes() as f64); +``` + +### Working with Statistics + +```rust +use vortex::stats::{Stat, StatsProviderExt}; + +let array: ArrayRef = ...; + +if let Some(min) = array.maybe_min() { + println!("Min: {}", min); +} + +if let Some(is_sorted) = array.maybe_stat(Stat::IsSorted) { + println!("Is sorted: {}", is_sorted); +} +``` + +## Running Examples + +All examples in this cookbook can be run with: + +```bash +# Run a specific example +cargo run --example basic_array_creation +cargo run --example string_arrays +cargo run --example debug_printing +cargo run --example array_iteration +cargo run --example struct_arrays +cargo run --example file_io + +# List all examples +cargo run --example +``` + +## See Also + +- [Rust API Documentation](https://docs.rs/vortex) +- [Concepts: Arrays](../concepts/arrays.md) +- [Concepts: Data Types](../concepts/dtypes.md) +- [Writing an Encoding](writing-an-encoding.md) diff --git a/docs/index.md b/docs/index.md index 49cd6842287..3003dbb6cf8 100644 --- a/docs/index.md +++ b/docs/index.md @@ -111,6 +111,7 @@ maxdepth: 1 caption: User Guides --- +guides/rust-cookbook guides/python-integrations guides/writing-an-encoding ``` diff --git a/docs/llm.txt b/docs/llm.txt new file mode 100644 index 00000000000..a532f1e3923 --- /dev/null +++ b/docs/llm.txt @@ -0,0 +1,417 @@ +# Vortex LLM Context + +This file provides condensed information about the Vortex array library for LLM code generation. + +## Overview + +Vortex is a columnar array library with zero-copy Arrow interoperability and cascading compression. +- Rust library with Python/Java/C bindings +- Zero-copy conversion to/from Apache Arrow +- Multiple encodings per data type for compression +- File format with lazy deserialization + +## Critical Patterns (Read This First!) + +These are the CORRECT patterns - common mistakes are listed in "Common Gotchas" below: + +```rust +// ✅ CORRECT: Validity construction +let validity: Validity = [true, false, true].into_iter().collect(); + +// ✅ CORRECT: BoolArray construction +let bools: BoolArray = [true, false, true].into_iter().collect(); + +// ✅ CORRECT: VarBinViewArray construction +let view: VarBinViewArray = vec!["a", "b"].into_iter().map(Some).collect(); + +// ✅ CORRECT: String scalar display +if let Some(s) = scalar.as_utf8().value() { + println!("{}", s.as_str()); // Need .as_str() +} + +// ✅ CORRECT: Struct creation with expect +let s = StructArray::from_fields(&[...]).expect("msg").into_array(); + +// ✅ CORRECT: Feature name for table display +#[cfg(feature = "pretty")] // NOT "table-display" +println!("{}", array.display_table()); + +// ✅ CORRECT: DType for primitives +use vortex::dtype::PType; +let dtype = DType::Primitive(PType::I32, Nullability::Nullable); + +// ✅ CORRECT: Creating arrays with nulls (use PrimitiveArray::from_option_iter) +let with_nulls = PrimitiveArray::from_option_iter([ + Some(1i32), None, Some(3), None, Some(5) +]); + +// ✅ CORRECT: fill_null passes Scalar by reference +let filled = fill_null(&array, &Scalar::from(99i32))?; + +// ✅ CORRECT: VarBinArray iteration with with_iterator +let varbin = VarBinArray::from(vec!["apple", "banana"]); +let strings = varbin.with_iterator(|iter| { + iter.map(|bytes_opt| { + bytes_opt.map(|bytes| String::from_utf8(bytes.to_vec()).unwrap()) + }).collect::>() +})?; +``` + +## Core Concepts + +### Array +- Immutable columnar data structure +- Has: dtype (logical type), encoding (physical layout), length, children, buffers +- Access: `scalar_at(index)`, `slice(range)`, `len()`, `dtype()`, `encoding()` +- Display: `display_values()`, `display_tree()`, `display_table()` (requires feature) + +### DType (Logical Type) +Types: Null, Bool, Primitive, Utf8, Binary, Struct, List, FixedSizeList, Decimal, Extension +- Get nullability: `dtype.is_nullable()` +- For primitives: `PType::try_from(dtype)` returns i8/u8/i16/u16/i32/u32/i64/u64/f16/f32/f64 + +### Encoding (Physical Layout) +Common encodings: +- Canonical: PrimitiveArray, BoolArray, VarBinViewArray, StructArray, ListArray +- Compressed: DictArray, FastLanes, RunEnd, Sparse, FSST, ALP, PCO, ZigZag +- Convert: `array.to_canonical()` to get canonical form + +### Validity (Nullability) +Three states: +- `Validity::NonNullable` - no nulls allowed +- `Validity::AllValid` - nullable type, but all values valid +- `Validity::Array(bool_array)` - explicit null mask + +## Quick Reference + +### Create Arrays + +```rust +use vortex::IntoArray; +use vortex::buffer::buffer; +use vortex::arrays::{PrimitiveArray, VarBinArray, BoolArray, StructArray}; +use vortex::validity::Validity; + +// Primitives (most common) +let ints = buffer![1i32, 2, 3, 4].into_array(); +let floats = buffer![1.0f64, 2.5, 3.14].into_array(); + +// With nulls (use iterator pattern) +let validity: Validity = [true, false, true].into_iter().collect(); +let nullable = PrimitiveArray::new(buffer![1i32, 2, 3], validity); + +// Strings (VarBinArray for offset-based, VarBinViewArray is canonical) +let strings = VarBinArray::from(vec!["foo", "bar"]).into_array(); +let with_nulls = VarBinArray::from_iter( + vec![Some("foo"), None, Some("bar")], + DType::Utf8(Nullability::Nullable) +); + +// VarBinViewArray (canonical string encoding) - use iterator with map(Some) +use vortex::arrays::VarBinViewArray; +let view_array: VarBinViewArray = vec!["hello", "world"] + .into_iter() + .map(Some) + .collect(); + +// Boolean +let bools: BoolArray = [true, false, true].into_iter().collect(); + +// Struct (use expect() for better error messages) +let struct_arr = StructArray::from_fields(&[ + ("name", VarBinArray::from(vec!["Alice", "Bob"]).into_array()), + ("age", buffer![30i32, 25].into_array()), +]).expect("Failed to create struct array").into_array(); +``` + +### Access Elements + +```rust +// Get scalar at index +let scalar = array.scalar_at(i); + +// Extract typed value from primitive +let value: Option = scalar.as_primitive().typed_value::(); + +// For strings, use .as_str() for display +if let Some(string_val) = scalar.as_utf8().value() { + println!("{}", string_val.as_str()); +} + +// Check validity +if array.is_valid(i) { ... } + +// Slice (O(1), no copy) +let sliced = array.slice(2..7); +``` + +### Iterate + +```rust +// Iterate by index +for i in 0..array.len() { + let scalar = array.scalar_at(i); + // process scalar +} + +// With validity check +for i in 0..array.len() { + if array.is_valid(i) { + let value = array.scalar_at(i).as_primitive().typed_value::(); + } +} + +// Chunk iterator for ChunkedArray (handle Result with if let Ok) +for chunk_result in array.to_array_iterator() { + if let Ok(chunk) = chunk_result { + // process chunk + } +} + +// VarBinArray iteration using ArrayAccessor trait's with_iterator +use vortex::accessor::ArrayAccessor; + +let varbin = VarBinArray::from(vec!["apple", "banana", "cherry"]); +let strings = varbin.with_iterator(|iter| { + iter.map(|bytes_opt| { + bytes_opt.map(|bytes| { + String::from_utf8(bytes.to_vec()).expect("Invalid UTF-8") + }) + }).collect::>() +}).expect("Failed to iterate"); +``` + +### Debug/Print + +```rust +// Values only +println!("{}", array.display_values()); // [1i32, 2i32, 3i32] + +// Metadata only (default Display) +println!("{}", array); // vortex.primitive(i32, len=3) + +// Full tree structure +println!("{}", array.display_tree()); + +// Table (requires pretty feature) +#[cfg(feature = "pretty")] +println!("{}", array.display_table()); +``` + +### File I/O + +```rust +use vortex::stream::ArrayStreamExt; +use vortex_file::{VortexWriteOptions, VortexOpenOptions}; + +// Write +VortexWriteOptions::default() + .write( + &mut tokio::fs::File::create("data.vortex").await?, + array.to_array_stream(), + ) + .await?; + +// Read +let array = VortexOpenOptions::new() + .open("data.vortex").await? + .scan()? + .into_array_stream()? + .read_all().await?; + +// Read with filter +use vortex_expr::{gt, lit, root}; +let filtered = VortexOpenOptions::new() + .open("data.vortex").await? + .scan()? + .with_filter(gt(root(), lit(50i32))) + .into_array_stream()? + .read_all().await?; +``` + +### Compression + +```rust +use vortex::compressor::BtrBlocksCompressor; +use vortex_layout::layouts::compact::CompactCompressor; +use vortex_file::WriteStrategyBuilder; + +// In-memory compression +let compressed = BtrBlocksCompressor::default().compress(&array)?; + +// File compression +VortexWriteOptions::default() + .with_strategy( + WriteStrategyBuilder::new() + .with_compressor(CompactCompressor::default()) + .build() + ) + .write(file, stream).await?; +``` + +### Arrow Conversion + +```rust +use vortex::dtype::arrow::FromArrowType; +use vortex_array::arrow::{FromArrowArray, IntoArrow}; +use vortex::ToCanonical; + +// Arrow -> Vortex +let dtype = DType::from_arrow(arrow_schema); +let vortex = ArrayRef::from_arrow(arrow_batch, false); + +// Vortex -> Arrow +let canonical = vortex_array.to_canonical()?; +let arrow = canonical.into_arrow()?; +``` + +### Builders + +```rust +use vortex::builders::{PrimitiveBuilder, ArrayBuilder}; + +let mut builder = PrimitiveBuilder::::new(); +builder.push(Some(1)); +builder.push(None); +builder.push(Some(3)); +let array = builder.finish(); +``` + +### Statistics + +```rust +use vortex::stats::{Stat, StatsProviderExt}; + +array.maybe_min() // Option +array.maybe_max() // Option +array.maybe_stat(Stat::IsSorted) // Option +array.maybe_stat(Stat::IsConstant) // Option +array.nbytes() // usize - memory size +``` + +## Common Patterns + +### Type Conversions +```rust +// Scalar types +let prim_scalar = scalar.as_primitive(); // PrimitiveScalar +let bool_scalar = scalar.as_bool(); // BoolScalar +let utf8_scalar = scalar.as_utf8(); // Utf8Scalar +let struct_scalar = scalar.as_struct(); // StructScalar + +// Array types +let prim_array = array.to_primitive(); // PrimitiveArray +let struct_array = array.to_struct(); // StructArray +``` + +### Struct Field Access +```rust +let struct_arr = array.to_struct(); +let field = struct_arr.field_by_name("column_name")?; +let field = struct_arr.field(index)?; +let names = struct_arr.dtype().as_struct().0.names(); +``` + +### Error Handling +```rust +use vortex_error::{VortexResult, vortex_err, vortex_bail}; + +fn foo() -> VortexResult { + vortex_bail!("Error message {}", arg); // early return error + // or + Err(vortex_err!("Error message {}", arg)) // create error +} +``` + +## Important Crates + +- `vortex` - Main crate, re-exports everything +- `vortex-array` - Core array types and traits +- `vortex-dtype` - Data type system +- `vortex-buffer` - Aligned buffers (`Buffer`, `buffer!` macro) +- `vortex-file` - File I/O +- `vortex-error` - Error types (`VortexError`, `VortexResult`) +- `vortex-expr` - Expression language for filters +- `vortex-scalar` - Scalar values + +Encodings (in `encodings/` directory): +- `vortex-dict` - Dictionary encoding +- `vortex-fastlanes` - FastLanes bitpacking +- `vortex-fsst` - FSST string compression +- `vortex-alp` - ALP floating point compression +- `vortex-pco` - Pco Delta compression +- `vortex-runend` - Run-end encoding +- `vortex-sparse` - Sparse array encoding + +## Canonical Encodings + +Each DType has a canonical encoding for Arrow interop: + +| DType | Canonical | Notes | +|-------|-----------|-------| +| Null | NullArray | All nulls | +| Bool | BoolArray | Bit-packed booleans | +| Primitive | PrimitiveArray | Native type array | +| Utf8/Binary | VarBinViewArray | String/binary views | +| Struct | StructArray | Multiple field arrays | +| List | ListArray | Variable-length lists | +| FixedSizeList | FixedSizeListArray | Fixed-length lists | +| Decimal | DecimalArray | Decimal values | +| Extension | ExtensionArray | User extensions | + +## Common Gotchas + +1. **Immutability**: Arrays are immutable. Use builders or `from_option_iter` to create new arrays. + - Operations like `fill_null` return NEW arrays, they don't modify existing ones +2. **ArrayRef vs Array**: `ArrayRef = Arc`. Use `.into_array()` to get ArrayRef. +3. **IntoArray trait**: Many types impl `IntoArray` for conversion to `ArrayRef`. +4. **Validity**: Check `is_valid(i)` before accessing nullable values. + - Must use iterator pattern: `[true, false].into_iter().collect()` NOT `Validity::from([true, false])` +5. **Encoding != DType**: Same logical type can have many physical encodings. +6. **Async I/O**: File operations require `#[tokio::main]` or async runtime. +7. **Feature flags**: Use `pretty` for table display, not `table-display`. Features: pretty, zstd, tokio, etc. +8. **VarBinViewArray**: Create with iterator pattern: `vec!["a", "b"].into_iter().map(Some).collect()` +9. **BufferString display**: Use `.as_str()` when printing: `string_val.as_str()` +10. **BoolArray**: Create with FromIterator: `[true, false].into_iter().collect()`, not `from(vec![])` +11. **DType construction**: Use `DType::Primitive(PType::I32, Nullability::Nullable)` NOT `DType::Int32(...)` +12. **fill_null**: Pass Scalar by reference: `fill_null(&array, &Scalar::from(99))` +13. **VarBinArray iteration**: Use `with_iterator` method from ArrayAccessor trait, not `to_array_iterator` +14. **PrimitiveArray with nulls**: Prefer `PrimitiveArray::from_option_iter` over manual builders + +## Code Examples Location + +See `/vortex/examples/` directory: +- `basic_array_creation.rs` - Creating arrays +- `string_arrays.rs` - VarBin and VarBinView comparison +- `debug_printing.rs` - Display options +- `array_iteration.rs` - Element access, iteration, and `with_iterator` +- `struct_arrays.rs` - Working with structs +- `file_io.rs` - Reading and writing files +- `core_concepts.rs` - Array trait vs concrete types +- `array_immutability.rs` - Demonstrates array immutability +- `advanced_array_types.rs` - ConstantArray, ChunkedArray, NullArray + +Documentation: `/docs/guides/rust-cookbook.md` + +## Type Aliases + +```rust +use vortex::{Array, ArrayRef}; // ArrayRef = Arc +use vortex_error::{VortexError, VortexResult}; // VortexResult = Result +use vortex_dtype::{DType, PType, Nullability}; +``` + +## Macro Reference + +```rust +buffer![1, 2, 3] // Create buffer with values +buffer![42; 1000] // Create buffer with repeated value +vortex_err!("msg {}", arg) // Create VortexError +vortex_bail!("msg {}", arg) // Return VortexError +vortex_panic!("msg") // Panic with VortexError +``` + +--- + +For complete documentation see: https://docs.vortex.dev +Rust API docs: https://docs.rs/vortex diff --git a/vortex/examples/advanced_array_types.rs b/vortex/examples/advanced_array_types.rs new file mode 100644 index 00000000000..b5f4f9d95c8 --- /dev/null +++ b/vortex/examples/advanced_array_types.rs @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: Copyright the Vortex contributors + +#![allow(clippy::expect_used)] + +//! This example demonstrates additional array types in Vortex beyond the basics, +//! including ConstantArray, ChunkedArray, and NullArray. + +use vortex::arrays::{ChunkedArray, ConstantArray, NullArray, VarBinArray}; +use vortex::buffer::buffer; +use vortex::dtype::{DType, Nullability}; +use vortex::scalar::Scalar; +use vortex::{Array, IntoArray}; + +fn main() { + // [constant-array] + println!("=== Constant Arrays ===\n"); + + // ConstantArray: Efficiently represents arrays where all values are the same + let constant = ConstantArray::new(Scalar::from(42i32), 1_000_000); + + println!("Constant array of 1M values:"); + println!(" Length: {}", constant.len()); + println!(" Memory usage: {} bytes (very efficient!)", constant.nbytes()); + println!(" First value: {}", constant.scalar_at(0)); + println!(" Last value: {}", constant.scalar_at(999_999)); + + // Constant arrays are useful for default values or padding + let zeros = ConstantArray::new(Scalar::from(0.0f64), 100); + println!("\nArray of 100 zeros:"); + println!(" All values are: {}", zeros.scalar_at(0)); + // [constant-array] + + // [constant-null] + // Constant null array + use vortex::dtype::PType; + let null_constant = ConstantArray::new( + Scalar::null(DType::Primitive(PType::I32, Nullability::Nullable)), + 50 + ); + println!("\nConstant null array:"); + println!(" All values are null: {}", null_constant.scalar_at(0).is_null()); + // [constant-null] + + // [chunked-array] + println!("\n=== Chunked Arrays ===\n"); + + // ChunkedArray: Combines multiple arrays into a single logical array + // Useful for streaming, parallel processing, or incremental data + + let chunk1 = buffer![1i32, 2, 3].into_array(); + let chunk2 = buffer![4i32, 5, 6].into_array(); + let chunk3 = buffer![7i32, 8, 9].into_array(); + + let chunked = ChunkedArray::from_iter([chunk1, chunk2, chunk3]).into_array(); + + println!("Chunked array:"); + println!(" Total length: {}", chunked.len()); + println!(" Values: {}", chunked.display_values()); + + // Access individual elements (transparently across chunks) + for i in 0..chunked.len() { + println!(" Element {}: {}", i, chunked.scalar_at(i)); + } + + // You can also iterate over chunks + println!("\nIterating over chunks:"); + for (idx, chunk_result) in chunked.to_array_iterator().enumerate() { + if let Ok(chunk) = chunk_result { + println!(" Chunk {}: {} elements", idx, chunk.len()); + } + } + // [chunked-array] + + // [mixed-type-chunks] + // Chunks can be created from different sources + println!("\n=== Mixed Source Chunks ===\n"); + + let string_chunk1 = VarBinArray::from(vec!["hello", "world"]).into_array(); + let string_chunk2 = VarBinArray::from(vec!["foo", "bar", "baz"]).into_array(); + + let string_chunked = ChunkedArray::from_iter([string_chunk1, string_chunk2]).into_array(); + println!("String chunks: {}", string_chunked.display_values()); + // [mixed-type-chunks] + + // [null-array] + println!("\n=== Null Arrays ===\n"); + + // NullArray: Arrays of all null values + let nulls = NullArray::new(5); + + println!("NullArray:"); + println!(" Length: {}", nulls.len()); + println!(" Memory usage: {} bytes (minimal!)", nulls.nbytes()); + println!(" DType: {}", nulls.dtype()); + + // All values are null + for i in 0..nulls.len() { + println!(" Element {}: {}", i, nulls.scalar_at(i)); + } + + // Useful for representing missing columns or as placeholders + // [null-array] + + // [sparse-pattern] + println!("\n=== Sparse-like Pattern with ConstantArray ===\n"); + + // You can simulate sparse arrays by chunking constant arrays with actual data + let default = ConstantArray::new(Scalar::from(0i32), 100).into_array(); + let actual_data = buffer![10i32, 20, 30].into_array(); + let more_defaults = ConstantArray::new(Scalar::from(0i32), 97).into_array(); + + let sparse_like = ChunkedArray::from_iter([default, actual_data, more_defaults]).into_array(); + + println!("Sparse-like array (200 elements, only 3 non-zero):"); + println!(" Total length: {}", sparse_like.len()); + println!(" Memory efficient for sparse data"); + // [sparse-pattern] +} \ No newline at end of file diff --git a/vortex/examples/array_immutability.rs b/vortex/examples/array_immutability.rs new file mode 100644 index 00000000000..d8c58e2c30b --- /dev/null +++ b/vortex/examples/array_immutability.rs @@ -0,0 +1,173 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: Copyright the Vortex contributors + +#![allow(clippy::expect_used)] + +//! This example demonstrates array immutability in Vortex. +//! +//! Arrays in Vortex are immutable - you cannot modify an existing array. +//! Instead, you create new arrays with the desired changes. + +use vortex::arrays::{PrimitiveArray, VarBinArray}; +use vortex::buffer::buffer; +use vortex::{Array, IntoArray}; + +fn main() { + // [immutability-concept] + println!("=== Array Immutability ===\n"); + + // Arrays are immutable once created + let original = buffer![1i32, 2, 3, 4, 5].into_array(); + println!("Original array: {}", original.display_values()); + + // ❌ CANNOT DO: Arrays are immutable + // original[0] = 42; // This doesn't exist! + // original.set(0, 42); // This doesn't exist either! + // original.push(6); // Cannot append to existing array! + + println!("\n⚠️ Arrays cannot be modified after creation!"); + println!("✅ Instead, create new arrays with the changes.\n"); + // [immutability-concept] + + // [modify-with-iter] + // Option 1: Create modified array using iterators + println!("=== Creating Modified Arrays with Iterators ===\n"); + + // Create a new array with modifications + let modified_values: Vec> = (0..original.len()) + .map(|i| { + if i == 0 { + Some(42) // Replace first element + } else if i == 2 { + None // Make third element null + } else { + original.scalar_at(i).as_primitive().typed_value::() + } + }) + .chain([Some(6), Some(7)]) // Add extra elements + .collect(); + + let modified = PrimitiveArray::from_option_iter(modified_values); + println!("Modified array: {}", modified.display_values()); + println!("Original unchanged: {}", original.display_values()); + // [modify-with-iter] + + // [modify-with-vec] + // Option 2: Collect to Vec, modify, create new array + println!("\n=== Creating Modified Arrays via Vec ===\n"); + + // Extract values to Vec + let mut values: Vec> = (0..original.len()) + .map(|i| original.scalar_at(i).as_primitive().typed_value::()) + .collect(); + + // Modify the Vec + values[0] = Some(100); + values.push(Some(6)); + values.remove(2); + + // Create new array from modified Vec + let from_vec = PrimitiveArray::from_option_iter(values); + println!("Array from modified Vec: {}", from_vec.display_values()); + println!("Original still unchanged: {}", original.display_values()); + // [modify-with-vec] + + // [string-modification] + // String arrays are also immutable + println!("\n=== String Array Immutability ===\n"); + + let string_array = VarBinArray::from(vec!["hello", "world", "rust"]); + println!("Original strings: {}", string_array.display_values()); + + // Create modified version using iterator + let modified_strings = vec!["hello", "vortex", "rust", "arrays"]; + let modified_array = VarBinArray::from(modified_strings); + + println!("Modified strings: {}", modified_array.display_values()); + println!("Original unchanged: {}", string_array.display_values()); + // [string-modification] + + // [functional-transformations] + // Functional transformations create new arrays + println!("\n=== Functional Transformations ===\n"); + + let numbers = buffer![10i32, 20, 30, 40, 50].into_array(); + + // Create a new array with doubled values + let doubled_values: Vec = (0..numbers.len()) + .map(|i| { + numbers.scalar_at(i) + .as_primitive() + .typed_value::() + .unwrap_or(0) * 2 + }) + .collect(); + let doubled = PrimitiveArray::from_iter(doubled_values); + + println!("Original: {}", numbers.display_values()); + println!("Doubled: {}", doubled.display_values()); + + // Filter: create new array with only values > 25 + let filtered_values: Vec = (0..numbers.len()) + .filter_map(|i| { + numbers.scalar_at(i) + .as_primitive() + .typed_value::() + .filter(|&v| v > 25) + }) + .collect(); + let filtered = PrimitiveArray::from_iter(filtered_values); + println!("Filtered (>25): {}", filtered.display_values()); + // [functional-transformations] + + // [slice-is-view] + // Slicing creates a view, doesn't modify original + println!("\n=== Slicing Creates Views ===\n"); + + let data = buffer![1i32, 2, 3, 4, 5, 6, 7, 8, 9, 10].into_array(); + let slice = data.slice(2..7); + + println!("Original: {}", data.display_values()); + println!("Slice [2..7]: {}", slice.display_values()); + println!("Original unchanged: {}", data.display_values()); + + // Slicing is O(1) - doesn't copy data + println!("Original nbytes: {}", data.nbytes()); + println!("Slice nbytes: {} (shares memory)", slice.nbytes()); + // [slice-is-view] + + // [compute-returns-new] + // Compute operations return NEW arrays, never modify existing ones + println!("\n=== Compute Operations Return New Arrays ===\n"); + + use vortex::compute::fill_null; + use vortex::scalar::Scalar; + + // Create array with nulls using from_option_iter + let with_nulls = PrimitiveArray::from_option_iter([ + Some(1i32), + None, + Some(3), + None, + Some(5) + ]); + println!("Array with nulls: {}", with_nulls.display_values()); + + // fill_null returns a NEW array - original is unchanged! + let with_nulls_array = with_nulls.into_array(); + let filled = fill_null(&with_nulls_array, &Scalar::from(99i32)) + .expect("Failed to fill nulls"); + + println!("After fill_null(99):"); + println!(" Returned array: {}", filled.display_values()); + println!(" Original unchanged: {}", with_nulls_array.display_values()); + + // Similarly for other operations like filter, take, etc. + // They all return NEW arrays rather than modifying existing ones + + println!("\n📝 Note: All compute operations follow this pattern:"); + println!(" - Take immutable array as input"); + println!(" - Return NEW array as output"); + println!(" - Original array is never modified"); + // [compute-returns-new] +} \ No newline at end of file diff --git a/vortex/examples/array_iteration.rs b/vortex/examples/array_iteration.rs new file mode 100644 index 00000000000..7d30d53f9b9 --- /dev/null +++ b/vortex/examples/array_iteration.rs @@ -0,0 +1,237 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: Copyright the Vortex contributors + +#![allow(clippy::expect_used)] +#![allow(clippy::use_debug)] +#![allow(clippy::if_then_some_else_none)] + +//! This example demonstrates how to iterate over arrays and access individual elements. +//! +//! Vortex provides several ways to access array data: +//! - scalar_at(index): Get a scalar value at a specific index +//! - Iterator pattern: Iterate over array chunks +//! - Specialized accessors for specific array types + +use vortex::arrays::{PrimitiveArray, VarBinArray}; +use vortex::buffer::buffer; +use vortex::validity::Validity; +use vortex::{Array, IntoArray}; + +fn main() { + // [scalar-at] + // Access individual elements using scalar_at + let array = buffer![10i32, 20, 30, 40, 50].into_array(); + + println!("=== Accessing Individual Elements ==="); + for i in 0..array.len() { + let scalar = array.scalar_at(i); + println!("Element {}: {}", i, scalar); + } + // [scalar-at] + + // [typed-values] + // Extract typed values from scalars + println!("\n=== Extracting Typed Values ==="); + + let int_array = buffer![1i32, 2, 3, 4, 5].into_array(); + for i in 0..int_array.len() { + let scalar = int_array.scalar_at(i); + // Use as_primitive() to get a PrimitiveScalar view + if let Some(value) = scalar.as_primitive().typed_value::() { + println!("Value {}: {}", i, value); + } + } + // [typed-values] + + // [iterate-with-validity] + // Handle nullable values during iteration + println!("\n=== Iterating with Nulls ==="); + + let validity: Validity = [true, false, true, false, true].into_iter().collect(); + let nullable_array = PrimitiveArray::new(buffer![1i32, 2, 3, 4, 5], validity); + + for i in 0..nullable_array.len() { + if nullable_array.is_valid(i) { + if let Some(value) = nullable_array + .scalar_at(i) + .as_primitive() + .typed_value::() + { + println!("Index {}: {}", i, value); + } + } else { + println!("Index {}: null", i); + } + } + // [iterate-with-validity] + + // [slice-array] + // Slicing creates a new array view without copying + println!("\n=== Slicing Arrays ==="); + + let original = buffer![0i32, 1, 2, 3, 4, 5, 6, 7, 8, 9].into_array(); + println!("Original: {}", original.display_values()); + + // Slice from index 2 to 7 (exclusive) + let sliced = original.slice(2..7); + println!("Sliced [2..7]: {}", sliced.display_values()); + + // Slicing is O(1) - it doesn't copy data + println!("Slice nbytes: {}", sliced.nbytes()); + // [slice-array] + + // [iterate-strings] + // Iterate over string arrays using scalar_at + println!("\n=== Iterating String Arrays ==="); + + let strings = VarBinArray::from(vec!["hello", "world", "vortex"]).into_array(); + + for i in 0..strings.len() { + let scalar = strings.scalar_at(i); + if let Some(string_value) = scalar.as_utf8().value() { + println!("String {}: {}", i, string_value.as_str()); + } + } + // [iterate-strings] + + // [array-accessor] + // Use ArrayAccessor for efficient iteration over VarBinArray + println!("\n=== ArrayAccessor Pattern ==="); + + use vortex::accessor::ArrayAccessor; + + let varbin = VarBinArray::from(vec!["apple", "banana", "cherry"]); + + // Convert bytes to UTF-8 strings using with_iterator + let collected = varbin.with_iterator(|iter| { + iter.map(|bytes_opt| { + bytes_opt.map(|bytes| { + String::from_utf8(bytes.to_vec()).expect("Invalid UTF-8") + }) + }) + .collect::>() + }).expect("Failed to iterate"); + + println!("Collected strings: {:?}", collected); + + // With nulls - use flatten to skip None values + use vortex::dtype::{DType, Nullability}; + + let with_nulls = VarBinArray::from_iter( + vec![Some("foo"), None, Some("bar"), None, Some("baz")], + DType::Utf8(Nullability::Nullable) + ); + + let non_null_strings = with_nulls.with_iterator(|iter| { + iter.flatten() // Skip None values + .map(|bytes| unsafe { String::from_utf8_unchecked(bytes.to_vec()) }) + .collect::>() + }).expect("Failed to iterate"); + + println!("Non-null strings only: {:?}", non_null_strings); + + // Count non-null values + let count = with_nulls.with_iterator(|iter| { + iter.filter(|opt| opt.is_some()).count() + }).expect("Failed to count"); + + println!("Non-null count: {}", count); + + // Transform strings using with_iterator + let uppercased = varbin.with_iterator(|iter| { + iter.map(|bytes_opt| { + bytes_opt.map(|bytes| { + let s = String::from_utf8(bytes.to_vec()).expect("Invalid UTF-8"); + s.to_uppercase() + }) + }) + .collect::>() + }).expect("Failed to transform"); + + println!("Uppercased strings: {:?}", uppercased); + + // Find strings matching a pattern + let contains_an = varbin.with_iterator(|iter| { + iter.enumerate() + .filter_map(|(i, bytes_opt)| { + bytes_opt.and_then(|bytes| { + let s = unsafe { String::from_utf8_unchecked(bytes.to_vec()) }; + if s.contains("an") { + Some((i, s)) + } else { + None + } + }) + }) + .collect::>() + }).expect("Failed to search"); + + println!("Strings containing 'an': {:?}", contains_an); + // [array-accessor] + + // [array-iterator] + // Use the array iterator for chunk-based iteration + println!("\n=== Chunk-Based Iteration ==="); + + use vortex::arrays::ChunkedArray; + + // Create a chunked array + let chunk1 = buffer![1i32, 2, 3].into_array(); + let chunk2 = buffer![4i32, 5, 6].into_array(); + let chunked = ChunkedArray::from_iter([chunk1, chunk2]).into_array(); + + println!("Chunked array: {}", chunked.display_values()); + + // Iterate over chunks + for (idx, chunk_result) in chunked.to_array_iterator().enumerate() { + if let Ok(chunk) = chunk_result { + println!("Chunk {}: {} elements", idx, chunk.len()); + println!(" Values: {}", chunk.display_values()); + } + } + // [array-iterator] + + // [manual-loop] + // Process values in a loop + println!("\n=== Processing Values ==="); + + let numbers = buffer![1i32, 2, 3, 4, 5].into_array(); + let mut sum = 0i32; + + for i in 0..numbers.len() { + if let Some(value) = numbers.scalar_at(i).as_primitive().typed_value::() { + sum += value; + } + } + + println!("Sum of {}: {}", numbers.display_values(), sum); + // [manual-loop] + + // [finding-values] + // Search for specific values + println!("\n=== Finding Values ==="); + + let data = buffer![10i32, 20, 30, 20, 50].into_array(); + let target = 20i32; + + println!("Looking for {} in {}", target, data.display_values()); + for i in 0..data.len() { + if let Some(value) = data.scalar_at(i).as_primitive().typed_value::() + && value == target { + println!(" Found at index {}", i); + } + } + // [finding-values] + + // [modify-note] + // Note: Arrays are immutable! To "modify", create a new array + println!("\n=== Note on Mutability ==="); + println!("Vortex arrays are immutable."); + println!("To create modified versions, use from_iter or collect:"); + + // Use from_iter to create transformed array + let transformed: Vec = (0..5).map(|i| i * 2).collect(); + let new_array = PrimitiveArray::from_iter(transformed); + println!("New array: {}", new_array.display_values()); + // [modify-note] +} diff --git a/vortex/examples/basic_array_creation.rs b/vortex/examples/basic_array_creation.rs new file mode 100644 index 00000000000..edce54671a9 --- /dev/null +++ b/vortex/examples/basic_array_creation.rs @@ -0,0 +1,92 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: Copyright the Vortex contributors + + +#![allow(clippy::expect_used)] +//! This example demonstrates how to create basic Vortex arrays. +//! +//! Vortex supports many array types. This example covers the most common cases: +//! - Primitive arrays (integers, floats) +//! - Boolean arrays +//! - Null arrays +//! - Arrays with and without null values + +use vortex::arrays::{BoolArray, NullArray, PrimitiveArray}; +use vortex::buffer::buffer; +use vortex::validity::Validity; +use vortex::{Array, IntoArray}; + +fn main() { + // [primitive-int] + // Create a primitive integer array using the buffer! macro + let int_array = buffer![1i32, 2, 3, 4, 5].into_array(); + println!("Integer array: {}", int_array.display_values()); + // Output: [1i32, 2i32, 3i32, 4i32, 5i32] + // [primitive-int] + + // [primitive-float] + // Create a primitive float array + let float_array = buffer![1.0f64, 2.5, 3.7, 4.0, 5.5].into_array(); + println!("Float array: {}", float_array.display_values()); + // [primitive-float] + + // [primitive-unsigned] + // Create unsigned integer arrays + let uint_array = buffer![10u64, 20, 30, 40, 50].into_array(); + println!("Unsigned array: {}", uint_array.display_values()); + // [primitive-unsigned] + + // [primitive-with-validity] + // Create an array with explicit validity (nullable values) + // First, create the buffer of values + let values = buffer![1i32, 2, 3, 4, 5]; + + // Then specify which values are valid using a boolean mask + let validity: Validity = [true, false, true, true, false].into_iter().collect(); + + let nullable_array = PrimitiveArray::new(values, validity); + println!("Nullable array: {}", nullable_array.display_values()); + // Output shows null where validity is false + // [primitive-with-validity] + + // [primitive-nonnullable] + // Create an array that explicitly has no nulls + let non_null_array = PrimitiveArray::new(buffer![1i32, 2, 3], Validity::NonNullable); + println!("Non-nullable array: {}", non_null_array.display_values()); + // [primitive-nonnullable] + + // [bool-array] + // Create a boolean array + let bool_array: BoolArray = [true, false, true, true, false].into_iter().collect(); + println!("Boolean array: {}", bool_array.display_values()); + // [bool-array] + + // [bool-with-validity] + // Boolean arrays with null values require more complex construction + // See struct arrays example for patterns with validity + // [bool-with-validity] + + // [null-array] + // Create an array of all nulls + let null_array = NullArray::new(5); + println!("Null array (length 5): {}", null_array.display_values()); + // [null-array] + + // [array-properties] + // All arrays have common properties + println!("\nArray properties:"); + println!(" Length: {}", int_array.len()); + println!(" DType: {}", int_array.dtype()); + println!(" Encoding: {}", int_array.encoding().id()); + println!(" Nbytes: {}", int_array.nbytes()); + // [array-properties] + + // [constant-array] + // You can also create constant arrays (all same value) efficiently + let constant = buffer![42u64; 1000].into_array(); + println!( + "\nConstant array of 1000 42s, nbytes: {}", + constant.nbytes() + ); + // [constant-array] +} diff --git a/vortex/examples/core_concepts.rs b/vortex/examples/core_concepts.rs new file mode 100644 index 00000000000..3ace817cbb4 --- /dev/null +++ b/vortex/examples/core_concepts.rs @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: Copyright the Vortex contributors + +#![allow(clippy::expect_used)] + +//! This example demonstrates core Vortex concepts like the Array trait vs concrete types. +//! +//! Understanding the difference between the Array trait and concrete array types is +//! fundamental to using Vortex effectively. + +use vortex::arrays::{BoolArray, PrimitiveArray, VarBinArray}; +use vortex::buffer::buffer; +use vortex::{Array, ArrayRef, IntoArray}; + +fn main() { + // [array-trait-vs-concrete] + // Array trait vs Concrete types demonstration + println!("=== Array Trait vs Concrete Types ===\n"); + + // Create specific concrete types + let varbin = VarBinArray::from(vec!["hello", "world"]); + let _primitive = PrimitiveArray::from_iter(vec![1i32, 2, 3]); + + // Both have type-specific methods + let _bytes = varbin.bytes(); // VarBinArray-specific method + println!("VarBinArray has {} bytes", varbin.bytes().len()); + + // Convert to trait object (ArrayRef = Arc) + let array_ref: ArrayRef = varbin.into_array(); + + // Now we can only use Array trait methods + println!("Array trait methods work on any type:"); + println!(" Length: {}", array_ref.len()); + println!(" DType: {}", array_ref.dtype()); + println!(" Encoding: {}", array_ref.encoding().id()); + println!(" Values: {}", array_ref.display_values()); + + // To use type-specific methods again, need to downcast + if let Some(varbin_again) = array_ref.as_any().downcast_ref::() { + println!("\nAfter downcast, can use VarBinArray methods:"); + println!(" Bytes length: {}", varbin_again.bytes().len()); + println!(" Offsets dtype: {}", varbin_again.offsets().dtype()); + } + // [array-trait-vs-concrete] + + // [polymorphism] + // Polymorphism: Write functions that work with any array type + println!("\n=== Polymorphism ===\n"); + + fn process_any_array(array: &dyn Array) { + println!("Processing {} with {} elements", array.encoding().id(), array.len()); + println!(" First element: {}", array.scalar_at(0)); + } + + let int_array = buffer![10i32, 20, 30].into_array(); + let string_array = VarBinArray::from(vec!["foo", "bar"]).into_array(); + let bool_array: ArrayRef = BoolArray::from_iter([true, false]).into_array(); + + process_any_array(int_array.as_ref()); + process_any_array(string_array.as_ref()); + process_any_array(bool_array.as_ref()); + // [polymorphism] + + // [heterogeneous-collections] + // Heterogeneous collections: Store different array types together + println!("\n=== Heterogeneous Collections ===\n"); + + let arrays: Vec = vec![ + buffer![1, 2, 3].into_array(), // PrimitiveArray + VarBinArray::from(vec!["a", "b"]).into_array(), // VarBinArray + BoolArray::from_iter([true, false]).into_array(), // BoolArray + ]; + + println!("Collection of {} different array types:", arrays.len()); + for (i, array) in arrays.iter().enumerate() { + println!( + " [{}] {} encoding with {} elements", + i, + array.encoding().id(), + array.len() + ); + } + // [heterogeneous-collections] + + // [multiple-encodings] + // Multiple encodings for same logical data + println!("\n=== Multiple Encodings ===\n"); + + use vortex::arrays::VarBinViewArray; + + // Same strings, different encodings + let varbin = VarBinArray::from(vec!["hello", "world", "vortex"]); + let view: VarBinViewArray = vec!["hello", "world", "vortex"] + .into_iter() + .map(Some) + .collect(); + + println!("Same data, different encodings:"); + println!(" VarBinArray encoding: {}", varbin.encoding().id()); + println!(" VarBinArray nbytes: {}", varbin.nbytes()); + println!(" VarBinViewArray encoding: {}", view.encoding().id()); + println!(" VarBinViewArray nbytes: {}", view.nbytes()); + + // Both implement Array trait, can be used interchangeably + let varbin_ref: ArrayRef = varbin.into_array(); + let view_ref: ArrayRef = view.into_array(); + + // Both can be processed by the same function + process_any_array(varbin_ref.as_ref()); + process_any_array(view_ref.as_ref()); + + // Convert to canonical encoding + let canonical = varbin_ref.to_canonical().into_array(); + println!("\nCanonical encoding: {}", canonical.encoding().id()); + // [multiple-encodings] +} \ No newline at end of file diff --git a/vortex/examples/debug_printing.rs b/vortex/examples/debug_printing.rs new file mode 100644 index 00000000000..bcbd69ada81 --- /dev/null +++ b/vortex/examples/debug_printing.rs @@ -0,0 +1,127 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: Copyright the Vortex contributors + +#![allow(clippy::expect_used)] +#![allow(clippy::use_debug)] +//! This example demonstrates different ways to print and inspect Vortex arrays. +//! +//! Vortex provides several display options for debugging and inspecting arrays: +//! - display_values(): Show logical values +//! - display_tree(): Show encoding tree structure +//! - display_table(): Show data in table format (when table-display feature is enabled) +//! - Default Display: Show metadata only + +use vortex::arrays::{StructArray, VarBinArray}; +use vortex::buffer::buffer; +use vortex::display::DisplayOptions; +use vortex::{Array, IntoArray}; + +fn main() { + let int_array = buffer![1i32, 2, 3, 4, 5].into_array(); + + // [default-display] + // Default display shows encoding and metadata only + println!("=== Default Display (Metadata) ==="); + println!("{}", int_array); + // [default-display] + + // [display-values] + // Display logical values of the array + println!("\n=== Display Values ==="); + println!("{}", int_array.display_values()); + // [display-values] + + // [display-tree] + // Display the encoding tree structure with memory info + // Shows the internal structure, encodings, buffers, and memory usage + println!("\n=== Display Tree ==="); + println!("{}", int_array.display_tree()); + // [display-tree] + + // [metadata-only] + // Explicitly use metadata-only display + println!("\n=== Metadata Only ==="); + println!("{}", int_array.display_as(DisplayOptions::MetadataOnly)); + // [metadata-only] + + // [complex-array] + // For more complex arrays, the tree display is very useful + println!("\n=== Complex Array Structure ==="); + + let struct_array = StructArray::from_fields(&[ + ("numbers", buffer![10i32, 20, 30].into_array()), + ( + "strings", + VarBinArray::from(vec!["foo", "bar", "baz"]).into_array(), + ), + ]) + .expect("struct array should be instantiated from Buffer and VarBinArray") + .into_array(); + + println!("Struct values:"); + println!("{}", struct_array.display_values()); + + println!("\nStruct tree:"); + println!("{}", struct_array.display_tree()); + // [complex-array] + + // [table-display] + // Table display is great for struct arrays (requires pretty feature) + #[cfg(feature = "pretty")] + { + println!("\n=== Table Display ==="); + println!("{}", struct_array.display_table()); + // Displays data in a nicely formatted table + } + // [table-display] + + // [inspect-properties] + // You can also inspect individual properties programmatically + println!("\n=== Inspecting Array Properties ==="); + println!("Length: {}", int_array.len()); + println!("DType: {}", int_array.dtype()); + println!("Encoding ID: {}", int_array.encoding_id()); + println!("Encoding: {}", int_array.encoding().id()); + println!("Is canonical: {}", int_array.is_canonical()); + println!("Bytes in memory: {}", int_array.nbytes()); + // [inspect-properties] + + // [inspect-validity] + // Check validity for specific indices + println!("\n=== Checking Validity ==="); + use vortex::arrays::PrimitiveArray; + use vortex::validity::Validity; + + let validity: Validity = [true, false, true].into_iter().collect(); + let nullable_array = PrimitiveArray::new(buffer![1i32, 2, 3], validity); + + println!("Array: {}", nullable_array.display_values()); + for i in 0..nullable_array.len() { + println!( + " Index {}: {} (valid: {})", + i, + nullable_array.scalar_at(i), + nullable_array.is_valid(i) + ); + } + // [inspect-validity] + + // [debug-trait] + // Arrays also implement Debug trait for use with debugging macros + println!("\n=== Debug Trait ==="); + println!("Debug output: {:?}", int_array.dtype()); + // [debug-trait] + + // [statistics] + // You can inspect array statistics + println!("\n=== Array Statistics ==="); + + let stats_array = buffer![5i32, 1, 9, 3, 7].into_array(); + println!("Array: {}", stats_array.display_values()); + + // Arrays store statistics that can be queried + // Note: Individual stats like min/max are available via the stats module + // but require decompressing to canonical form for arbitrary encodings + println!("Nbytes: {}", stats_array.nbytes()); + // [statistics] +} diff --git a/vortex/examples/file_io.rs b/vortex/examples/file_io.rs new file mode 100644 index 00000000000..fa1fa812cc8 --- /dev/null +++ b/vortex/examples/file_io.rs @@ -0,0 +1,213 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: Copyright the Vortex contributors + + +#![allow(clippy::expect_used)] +#![allow(clippy::unwrap_used)] +#![allow(clippy::use_debug)] +#![allow(unexpected_cfgs)] + +//! This example demonstrates how to read and write Vortex files. +//! +//! Vortex provides async file I/O using Tokio. Files can be written with various +//! compression strategies and read with filtering and projection capabilities. + +use vortex::arrays::{PrimitiveArray, StructArray, VarBinArray}; +use vortex::buffer::buffer; +use vortex::stream::ArrayStreamExt; +use vortex::validity::Validity; +use vortex::{Array, IntoArray, ToCanonical}; +use vortex_error::VortexResult; +use vortex_expr::{gt, lit, root}; +use vortex_file::{VortexOpenOptions, VortexWriteOptions, WriteStrategyBuilder}; +use vortex_layout::layouts::compact::CompactCompressor; + +#[tokio::main] +async fn main() -> VortexResult<()> { + basic_write_read().await?; + compressed_write_read().await?; + filtered_read().await?; + struct_write_read().await?; + + // Cleanup + let _ = tokio::fs::remove_file("example_basic.vortex").await; + let _ = tokio::fs::remove_file("example_compressed.vortex").await; + let _ = tokio::fs::remove_file("example_filtered.vortex").await; + let _ = tokio::fs::remove_file("example_struct.vortex").await; + + Ok(()) +} + +async fn basic_write_read() -> VortexResult<()> { + println!("=== Basic Write and Read ===\n"); + + // [basic-write] + // Create an array + let array = PrimitiveArray::new(buffer![0u64, 1, 2, 3, 4], Validity::NonNullable); + + // Write to file using default options + VortexWriteOptions::default() + .write( + &mut tokio::fs::File::create("example_basic.vortex").await?, + array.to_array_stream(), + ) + .await?; + + println!("Written array: {}", array.display_values()); + // [basic-write] + + // [basic-read] + // Read the entire file back + let read_array = VortexOpenOptions::new() + .open("example_basic.vortex") + .await? + .scan()? + .into_array_stream()? + .read_all() + .await?; + + println!("Read array: {}", read_array.display_values()); + println!("Arrays match: {}\n", array.len() == read_array.len()); + // [basic-read] + + Ok(()) +} + +async fn compressed_write_read() -> VortexResult<()> { + println!("=== Compressed Write and Read ===\n"); + + // [compressed-write] + let array = buffer![42u64; 10000].into_array(); + + println!("Original array nbytes: {}", array.nbytes()); + + // Write with compact compression + VortexWriteOptions::default() + .with_strategy( + WriteStrategyBuilder::new() + .with_compressor(CompactCompressor::default()) + .build(), + ) + .write( + &mut tokio::fs::File::create("example_compressed.vortex").await?, + array.to_array_stream(), + ) + .await?; + + let file_size = tokio::fs::metadata("example_compressed.vortex") + .await? + .len(); + println!("File size: {} bytes", file_size); + println!( + "Compression ratio: {:.2}x\n", + array.nbytes() as f64 / file_size as f64 + ); + // [compressed-write] + + // [compressed-read] + let read_array = VortexOpenOptions::new() + .open("example_compressed.vortex") + .await? + .scan()? + .into_array_stream()? + .read_all() + .await?; + + println!("Read compressed array length: {}", read_array.len()); + // [compressed-read] + + Ok(()) +} + +async fn filtered_read() -> VortexResult<()> { + println!("=== Filtered Read (Pushdown) ===\n"); + + // [filtered-write] + // Write an array with values 0-99 + let array = PrimitiveArray::from_iter(0..100u64); + + VortexWriteOptions::default() + .write( + &mut tokio::fs::File::create("example_filtered.vortex").await?, + array.to_array_stream(), + ) + .await?; + // [filtered-write] + + // [filtered-read] + // Read only values greater than 50 + let filtered = VortexOpenOptions::new() + .open("example_filtered.vortex") + .await? + .scan()? + .with_filter(gt(root(), lit(50u64))) + .into_array_stream()? + .read_all() + .await?; + + println!("Original length: {}", array.len()); + println!("Filtered length: {}", filtered.len()); + println!( + "Filtered values (first 10): {:?}", + (0..10.min(filtered.len())) + .map(|i| filtered.scalar_at(i).as_primitive().typed_value::()) + .collect::>() + ); + // [filtered-read] + + Ok(()) +} + +async fn struct_write_read() -> VortexResult<()> { + println!("\n=== Struct Write and Read ===\n"); + + // [struct-write] + // Create a struct array with multiple fields + let names = VarBinArray::from(vec!["Alice", "Bob", "Charlie", "Diana"]).into_array(); + let ages = buffer![30i32, 25, 35, 28].into_array(); + let scores = buffer![95.5f64, 87.3, 91.2, 88.9].into_array(); + + let people = StructArray::from_fields(&[("name", names), ("age", ages), ("score", scores)]) + .unwrap() + .into_array(); + + println!("Writing struct array:"); + #[cfg(feature = "pretty")] + println!("{}", people.display_table()); + #[cfg(not(feature = "pretty"))] + println!("{}", people.display_values()); + + VortexWriteOptions::default() + .write( + &mut tokio::fs::File::create("example_struct.vortex").await?, + people.to_array_stream(), + ) + .await?; + // [struct-write] + + // [struct-read] + let read_struct = VortexOpenOptions::new() + .open("example_struct.vortex") + .await? + .scan()? + .into_array_stream()? + .read_all() + .await?; + + println!("\nRead struct array:"); + #[cfg(feature = "pretty")] + println!("{}", read_struct.display_table()); + #[cfg(not(feature = "pretty"))] + println!("{}", read_struct.display_values()); + // [struct-read] + + // [field-access] + // Access specific fields after reading + let struct_arr = read_struct.to_struct(); + if let Ok(age_field) = struct_arr.field_by_name("age") { + println!("\nAges only: {}", age_field.display_values()); + } + // [field-access] + + Ok(()) +} diff --git a/vortex/examples/string_arrays.rs b/vortex/examples/string_arrays.rs new file mode 100644 index 00000000000..94c7cd533a6 --- /dev/null +++ b/vortex/examples/string_arrays.rs @@ -0,0 +1,125 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: Copyright the Vortex contributors + + +#![allow(clippy::expect_used)] +//! This example demonstrates working with string arrays in Vortex. +//! +//! Vortex has two main encodings for variable-length data like strings: +//! - VarBinArray: Uses offsets to index into a contiguous buffer (similar to Arrow's Utf8Array) +//! - VarBinViewArray: Uses views into one or more buffers (similar to Arrow's StringViewArray) +//! +//! VarBinViewArray is the canonical encoding for strings and is generally more efficient for +//! operations like slicing and concatenation. + +use vortex::arrays::{VarBinArray, VarBinViewArray}; +use vortex::dtype::{DType, Nullability}; +use vortex::{Array, IntoArray}; + +fn main() { + // [varbin-from-vec] + // Create a VarBinArray from a Vec of string slices + let varbin = VarBinArray::from(vec!["hello", "world", "vortex"]); + println!("VarBin array: {}", varbin.display_values()); + // Output: ["hello", "world", "vortex"] + // [varbin-from-vec] + + // [varbin-from-iter] + // Create a VarBinArray from an iterator with a specific DType + let nullable_strings = VarBinArray::from_iter( + vec![Some("foo"), None, Some("bar"), Some("baz")], + DType::Utf8(Nullability::Nullable), + ); + println!("VarBin with nulls: {}", nullable_strings.display_values()); + // [varbin-from-iter] + + // [varbinview-from-vec] + // Create a VarBinViewArray from a Vec of string slices + // This is the canonical encoding for strings + let view_array: VarBinViewArray = vec!["hello", "world", "vortex"] + .into_iter() + .map(Some) + .collect(); + println!("VarBinView array: {}", view_array.display_values()); + // [varbinview-from-vec] + + // [varbinview-from-iter] + // VarBinViewArray with nullable strings + let nullable_views = VarBinViewArray::from_iter( + vec![Some("alpha"), None, Some("beta"), Some("gamma")], + DType::Utf8(Nullability::Nullable), + ); + println!("VarBinView with nulls: {}", nullable_views.display_values()); + // [varbinview-from-iter] + + // [binary-data] + // You can also create binary (non-UTF8) arrays + let binary_data = VarBinArray::from_iter( + vec![Some(b"binary".as_slice()), None, Some(b"data".as_slice())], + DType::Binary(Nullability::Nullable), + ); + println!("Binary array: {}", binary_data.display_values()); + // [binary-data] + + // [array-vs-view] + // Understanding the difference between VarBinArray and VarBinViewArray + println!("\n=== VarBinArray vs VarBinViewArray ==="); + + // VarBinArray: Offset-based encoding (like Arrow StringArray) + // Memory layout: [offsets: 0,5,18,19] [data: "shortmedium lengthx"] + let varbin_arr = VarBinArray::from(vec!["short", "medium length", "x"]); + println!("VarBinArray:"); + println!(" Encoding: {}", varbin_arr.encoding().id()); + println!(" Memory usage: {} bytes", varbin_arr.nbytes()); + println!(" Structure: Single data buffer + offsets"); + println!(" Best for: Sequential access, small uniform strings"); + + // VarBinViewArray: View-based encoding (like Arrow StringViewArray) + // Memory layout: views point to data in one or more buffers + let view_arr: VarBinViewArray = vec!["short", "medium length", "x"] + .into_iter() + .map(Some) + .collect(); + println!("\nVarBinViewArray:"); + println!(" Encoding: {}", view_arr.encoding().id()); + println!(" Memory usage: {} bytes", view_arr.nbytes()); + println!(" Structure: Multiple buffers + views"); + println!(" Best for: Slicing, concatenation, mixed-size strings"); + println!(" Is canonical: Yes (for Utf8 dtype)"); + // [array-vs-view] + + // [converting-between] + // You can convert between encodings using to_canonical + let varbin = VarBinArray::from(vec!["convert", "me"]).into_array(); + let canonical = varbin.to_canonical().into_array(); + println!( + "\nConverted to canonical encoding: {}", + canonical.encoding().id() + ); + // Canonical for Utf8 is VarBinViewArray + // [converting-between] + + // [empty-strings] + // Empty strings are valid and different from null + let with_empty = VarBinArray::from(vec!["", "not empty", ""]); + println!( + "\nArray with empty strings: {}", + with_empty.display_values() + ); + + let with_nulls = VarBinArray::from_iter( + vec![None, Some("not empty"), None], + DType::Utf8(Nullability::Nullable), + ); + println!("Array with nulls: {}", with_nulls.display_values()); + // [empty-strings] + + // [large-strings] + // For very large strings (>4GB total), you can use the appropriate dtype + let large_strings = VarBinArray::from(vec![ + "This example shows regular strings", + "For datasets >4GB use appropriate offset types", + ]); + println!("\nStrings array dtype: {}", large_strings.dtype()); + // [large-strings] +} diff --git a/vortex/examples/struct_arrays.rs b/vortex/examples/struct_arrays.rs new file mode 100644 index 00000000000..1063088ee66 --- /dev/null +++ b/vortex/examples/struct_arrays.rs @@ -0,0 +1,199 @@ +// SPDX-License-Identifier: Apache-2.0 +// SPDX-FileCopyrightText: Copyright the Vortex contributors + + +#![allow(clippy::expect_used)] +#![allow(clippy::expect_used)] +#![allow(clippy::use_debug)] +#![allow(unexpected_cfgs)] + +//! This example demonstrates how to create and work with struct arrays. +//! +//! Struct arrays in Vortex are similar to structs in programming languages - they group +//! multiple fields together. Each field is itself an array, and all fields have the same length. + +use vortex::arrays::{StructArray, VarBinArray}; +use vortex::buffer::buffer; +use vortex::validity::Validity; +use vortex::{Array, IntoArray}; + +fn main() { + // [struct-from-fields] + // Create a struct array from field name/array pairs + println!("=== Creating Struct Arrays ==="); + + let names = VarBinArray::from(vec!["Alice", "Bob", "Charlie"]).into_array(); + let ages = buffer![30i32, 25, 35].into_array(); + + let people = StructArray::from_fields(&[("name", names), ("age", ages)]) + .expect("Failed to create struct array") + .into_array(); + + println!("People struct: {}", people.display_values()); + // [struct-from-fields] + + // [struct-try-new] + // Create a struct array with explicit validity + println!("\n=== Struct with Validity ==="); + + let x_values = buffer![1i32, 2, 3, 4].into_array(); + let y_values = buffer![10i32, 20, 30, 40].into_array(); + + let validity: Validity = [true, true, false, true].into_iter().collect(); + let points = StructArray::try_new( + ["x", "y"].into(), + vec![x_values, y_values], + 4, // length + validity, // third point is null + ) + .expect("Failed to create struct array with validity") + .into_array(); + + println!("Points: {}", points.display_values()); + // Output shows third struct as null + // [struct-try-new] + + // [access-fields] + // Access fields from a struct array + println!("\n=== Accessing Fields ==="); + + let struct_array = StructArray::from_fields(&[ + ("id", buffer![1i32, 2, 3].into_array()), + ( + "label", + VarBinArray::from(vec!["foo", "bar", "baz"]).into_array(), + ), + ]) + .expect("Failed to create struct array with id and label"); + + // Get a specific field by name + if let Ok(id_field) = struct_array.field_by_name("id") { + println!("ID field: {}", id_field.display_values()); + } + + // Get field by index + let label_field = &struct_array.fields()[1]; + println!("Label field: {}", label_field.display_values()); + + // List all field names + println!("Field names: {:?}", struct_array.names()); + // [access-fields] + + // [iterate-structs] + // Iterate over struct values + println!("\n=== Iterating Struct Values ==="); + + let products = StructArray::from_fields(&[ + ( + "product", + VarBinArray::from(vec!["Apple", "Banana", "Cherry"]).into_array(), + ), + ("price", buffer![1.20f64, 0.50, 2.00].into_array()), + ("quantity", buffer![10i32, 25, 5].into_array()), + ]) + .expect("Failed to create products struct array") + .into_array(); + + for i in 0..products.len() { + let struct_scalar = products.scalar_at(i); + println!("Row {}: {}", i, struct_scalar); + + // Access individual fields in the struct scalar + let struct_val = struct_scalar.as_struct(); + if let Some(product) = struct_val.field("product") { + println!(" Product: {}", product); + } + } + // [iterate-structs] + + // [nested-structs] + // Nested struct arrays + println!("\n=== Nested Structs ==="); + + // Create inner struct (address) + let address = StructArray::from_fields(&[ + ( + "street", + VarBinArray::from(vec!["123 Main St", "456 Oak Ave"]).into_array(), + ), + ( + "city", + VarBinArray::from(vec!["Springfield", "Portland"]).into_array(), + ), + ]) + .expect("Failed to create address struct array") + .into_array(); + + // Create outer struct (person with address) + let person_with_address = StructArray::from_fields(&[ + ("name", VarBinArray::from(vec!["Alice", "Bob"]).into_array()), + ("address", address), + ]) + .expect("Failed to create nested struct array") + .into_array(); + + println!("Nested structs: {}", person_with_address.display_values()); + // [nested-structs] + + // [struct-with-mixed-types] + // Struct with various data types + println!("\n=== Struct with Mixed Types ==="); + + use vortex::arrays::BoolArray; + + let employees = StructArray::from_fields(&[ + ("id", buffer![1001u64, 1002, 1003].into_array()), + ( + "name", + VarBinArray::from(vec!["Alice", "Bob", "Charlie"]).into_array(), + ), + ("salary", buffer![75000.0f64, 82000.0, 95000.0].into_array()), + ( + "active", + BoolArray::from_iter([true, true, false]).into_array(), + ), + ]) + .expect("Failed to create employees struct array") + .into_array(); + + println!("Employees:"); + #[cfg(feature = "pretty")] + println!("{}", employees.display_table()); + + #[cfg(not(feature = "pretty"))] + println!("{}", employees.display_values()); + // [struct-with-mixed-types] + + // [struct-properties] + // Inspect struct properties + println!("\n=== Struct Properties ==="); + + let sample_struct = StructArray::from_fields(&[ + ("a", buffer![1i32, 2].into_array()), + ("b", buffer![3i32, 4].into_array()), + ]) + .expect("Failed to create sample struct array"); + + println!("Number of fields: {}", sample_struct.names().len()); + println!("Length: {}", sample_struct.len()); + println!("Field names: {:?}", sample_struct.names()); + // [struct-properties] + + // [empty-struct] + // Empty struct (struct with no fields) + println!("\n=== Empty Struct ==="); + + use vortex::dtype::FieldNames; + + let empty_struct = StructArray::try_new( + FieldNames::empty(), + vec![], + 3, // 3 rows, but no fields + Validity::NonNullable, + ) + .expect("Failed to create empty struct array") + .into_array(); + + println!("Empty struct (3 rows): {}", empty_struct.display_values()); + // [empty-struct] +}