Parquet filter pushdown v4 #7850

XiangpengHao · 2025-07-02T20:25:41Z

This is my latest attempt to make pushdown faster. Prior art: #6921

Problems of #6921

It proactively loads entire row group into memory. (rather than only loading pages that passing the filter predicate)
It only cache decompressed pages, still paying the decoding cost twice.

This PR takes a different approach, it does not change the decoding pipeline, so we avoid the problem 1. It also caches the arrow record batch, so avoid problem 2.

But this means we need to use more memory to cache data.

How it works?

It instruments the array_readers with a transparent cached_array_reader.
The cache layer will first consult the RowGroupCache to look for a batch, and only reads from underlying reader on a cache miss.
There're cache producer and cache consumer. Producer is when we build filters we insert arrow arrays into cache, consumer is when we build outputs, we remove arrow array from cache. So the memory usage should look like this:

    ▲
    │     ╭─╮
    │    ╱   ╲
    │   ╱     ╲
    │  ╱       ╲
    │ ╱         ╲
    │╱           ╲
    └─────────────╲──────► Time
    │      │      │
    Filter  Peak  Consume
    Phase (Built) (Decrease)

In a concurrent setup, not all reader may reach the peak point at the same time, so the peak system memory usage might be lower.

It has a max_cache_size knob, this is a per row group setting. If the row group has used up the budget, the cache stops taking new data. and the cached_array_reader will fallback to read and decode from Parquet.

Other benefits

This architecture allows nested columns (but not implemented in this pr), i.e., it's future proof.
There're many performance optimizations to further squeeze the performance, but even with current state, it has no regressions.

How does it perform?

My criterion somehow won't produces a result from --save-baseline, so I asked llm to generate a table from this benchmark:

cargo bench --bench arrow_reader_clickbench --features "arrow async" "async"

Baseline is the implementation for current main branch.
New Unlimited is the new pushdown with unlimited memory budget.
New 100MB is the new pushdown but the memory budget for a row group caching is 100MB.

Query  | Baseline (ms) | New Unlimited (ms) | Diff (ms)  | New 100MB (ms) | Diff (ms)
-------+--------------+--------------------+-----------+----------------+-----------
Q1     | 0.847          | 0.803               | -0.044     | 0.812          | -0.035    
Q10    | 4.060          | 6.273               | +2.213     | 6.216          | +2.156    
Q11    | 5.088          | 7.152               | +2.064     | 7.193          | +2.105    
Q12    | 18.485         | 14.937              | -3.548     | 14.904         | -3.581    
Q13    | 24.859         | 21.908              | -2.951     | 21.705         | -3.154    
Q14    | 23.994         | 20.691              | -3.303     | 20.467         | -3.527    
Q19    | 1.894          | 1.980               | +0.086     | 1.996          | +0.102    
Q20    | 90.325         | 64.689              | -25.636    | 74.478         | -15.847   
Q21    | 106.610        | 74.766              | -31.844    | 99.557         | -7.053    
Q22    | 232.730        | 101.660             | -131.070   | 204.800        | -27.930   
Q23    | 222.800        | 186.320             | -36.480    | 186.590        | -36.210   
Q24    | 24.840         | 19.762              | -5.078     | 19.908         | -4.932    
Q27    | 80.463         | 47.118              | -33.345    | 49.597         | -30.866   
Q28    | 78.999         | 47.583              | -31.416    | 51.432         | -27.567   
Q30    | 28.587         | 28.710              | +0.123     | 28.926         | +0.339    
Q36    | 80.157         | 57.954              | -22.203    | 58.012         | -22.145   
Q37    | 46.962         | 45.901              | -1.061     | 45.386         | -1.576    
Q38    | 16.324         | 16.492              | +0.168     | 16.522         | +0.198    
Q39    | 20.754         | 20.734              | -0.020     | 20.648         | -0.106    
Q40    | 22.554         | 21.707              | -0.847     | 21.995         | -0.559    
Q41    | 16.430         | 16.391              | -0.039     | 16.581         | +0.151    
Q42    | 6.045          | 6.157               | +0.112     | 6.120          | +0.075

If we consider the diff within 5ms to be noise, then we are never worse than the current implementation.
We see significant improvements for string-heavy queries, because string columns are large, they take time to decompress and decode.
100MB cache budget seems to have small performance impact.

Limitations

It only works for async readers, because sync reader do not follow the same row group by row group structure.
It is memory hungry -- compared to Experimental parquet decoder with first-class selection pushdown support #6921. But changing decoding pipeline without eager loading entire row group would require significant changes to the current decoding infrastructure, e.g., we need to make page iterator an async function.
It currently doesn't support nested columns, more specifically, it doesn't support nested columns with nullable parents. but supporting it is straightforward, no big changes.
The current memory accounting is not accurate, it will overestimate the memory usage, especially when reading string view arrays, where multiple string view may share the same underlying buffer, and that buffer size is counted twice. Anyway, we never exceeds the user configured memory usage.
If one row passes the filter, the entire batch will be cached. We can probably optimize this though.

Next steps?

This pr is largely proof of concept, I want to collect some feedback before sending a multi-thousands pr :)

Some items I can think of:

Design an interface for user to specify the cache size limit, currently it's hard-coded.
Don't instrument nested array reader if the parquet file has nullable parent. currently it will panic
More testing, and integration test/benchmark with Datafusion

XiangpengHao · 2025-07-02T20:27:16Z

parquet/src/arrow/array_reader/builder.rs

+#[derive(Clone)]
+pub struct CacheOptions<'a> {
+    pub projection_mask: &'a ProjectionMask,
+    pub cache: Arc<Mutex<RowGroupCache>>,


Practically there's no contention because there's not parallelism in decoding one row group. we add mutex here because we need to use Arc.

XiangpengHao · 2025-07-02T20:29:08Z

parquet/src/arrow/async_reader/mod.rs

+        let row_group_cache = Arc::new(Mutex::new(RowGroupCache::new(
+            batch_size,
+            // None,
+            Some(1024 * 1024 * 100),


This is currently hard-coded, leave it a future work to make it configurable through user settings

XiangpengHao · 2025-07-02T20:30:38Z

parquet/src/arrow/async_reader/mod.rs

@@ -613,8 +623,18 @@ where
                    .fetch(&mut self.input, predicate.projection(), selection)
                    .await?;

+                let mut cache_projection = predicate.projection().clone();
+                cache_projection.intersect(&projection);


A column is cached if and only if it appears both in output projection and filter projection

So one thing I didn't understand after reading this PR in detail was how the relative row positions are updated after applying a filter.

For example if we are applying multiple filters, the first may reduce the original RowSelection down to [100->200], and now when the second filter runs it is only evaluated on the 100->200 rows , not the original selection

In other words I think there needs to be some sort of function equvalent to RowSelection::and_then that applies to the cache

// Narrow the cache so that it only retains the results of evaluating the predicate let row_group_cache = row_group_cache.and_then(resulting_selection)

Maybe this is the root cause of https://github.com/apache/datafusion/actions/runs/16302299778/job/46039904381?pr=16711

XiangpengHao · 2025-07-02T20:31:33Z

parquet/src/arrow/array_reader/cached_array_reader.rs

+    }
+
+    fn get_def_levels(&self) -> Option<&[i16]> {
+        None // we don't allow nullable parent for now.


nested columns not support yet

alamb · 2025-07-02T20:57:08Z

😮 -- My brain is likely too fried at the moment to review this properly but it is on my list for first thing tomorrow

zhuqi-lucas · 2025-07-03T10:10:03Z

Thank you @XiangpengHao for amazing work, i will try to review and test this PR!

alamb

TLDR is I think this is really clever - very nice @XiangpengHao . I left some structural comments / suggestions but nothing major.

I will run some more benchmarks, but it was showing very nice improvements for Q21 locally for me (129ms --> 90ms)

If that looks good I'll wire it up in DataFusion and run those benchmarks

Some thoughts:

I would be happy to wire in the buffering limit / API
As you say, there are many more improvements possible -- specifically I suspect the RowSelector representation is going to cause us pain and suffering for filters that have many short selections when bitmaps would be a better choice

Buffering

I think buffering the intermediate filter results is unavoidable if we want to preserve the current behavior to minimizes the size of IO requests

If we want to reduce buffering I think we can only really do it by increasing the number of IO requests (so we can incrementally produce the final output). I think we should proceed with buffering and then tune if/when needed

alamb · 2025-07-03T12:18:38Z

parquet/src/arrow/async_reader/mod.rs

+                        CacheOptions {
+                            projection_mask: &cache_projection,
+                            cache: row_group_cache.clone(),
+                            role: crate::arrow::array_reader::CacheRole::Producer,
+                        },


structurally both here and below it might help to keep the creation ofthe CacheOptions into the cache itself so a reader of this code doesn't have to understand the innards of the cache

Suggested change

CacheOptions {

projection_mask: &cache_projection,

cache: row_group_cache.clone(),

role: crate::arrow::array_reader::CacheRole::Producer,

},

row_group_cache.producer_options(projection, predicate.proection())

alamb · 2025-07-03T12:20:52Z

parquet/src/arrow/async_reader/mod.rs


        let reader = ParquetRecordBatchReader::new(array_reader, plan);

        Ok((self, Some(reader)))
    }
+
+    fn compute_cache_projection(&self, projection: &ProjectionMask) -> Option<ProjectionMask> {


Suggested change

fn compute_cache_projection(&self, projection: &ProjectionMask) -> Option<ProjectionMask> {

/// Compute which columns are used in filters and the final (output) projection

fn compute_cache_projection(&self, projection: &ProjectionMask) -> Option<ProjectionMask> {

alamb · 2025-07-03T12:30:39Z

parquet/src/arrow/array_reader/cached_array_reader.rs

+
+        let start_position = self.outer_position - row_count;
+
+        let selection_buffer = row_selection_to_boolean_buffer(row_count, self.selections.iter());


this is clever -- though it will likely suffer from the same "RowSelection is a crappy representation for small selection runs" problem

yes, this is to alleviate the problem. If we have multiple small selection runs on the same cached batch, first combine them into a boolean buffer, and do boolean selection once.

alamb · 2025-07-03T12:32:58Z

parquet/src/arrow/array_reader/row_group_cache.rs

+pub struct CacheKey {
+    /// Column index in the row group
+    pub column_idx: usize,
+    /// Starting row ID for this batch


I think it would help here to clarify what these Row ids are relative to

I THINK they are the row ids relative to the underlying column reader (which might already have a RowSelection applied)

If so it would be good to clarify they are not absolute row ids from the (unfiltered) Row Group, for example

alamb · 2025-07-03T12:43:42Z

parquet/src/arrow/array_reader/cached_array_reader.rs

+                .expect("data must be already cached in the read_records call, this is a bug");
+            let cached = cached.slice(overlap_start - batch_start, selection_length);
+            let filtered = arrow_select::filter::filter(&cached, &mask_array)?;
+            selected_arrays.push(filtered);


You can probably use the new BatchCoalescer here instead: https://docs.rs/arrow/latest/arrow/compute/struct.BatchCoalescer.html

It is definitely faster for primitive arrays and will save intermediate memory usage

It might have some trouble with StringView as it also tries to gc internally too -- we may need to optimize the output to avoid gc'ing if we see the same buffer from call to call

alamb · 2025-07-03T12:54:14Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing pushdown-v4 (1851f0b) to af8564f diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=pushdown-v4
Results will be posted here when complete

alamb · 2025-07-03T13:20:03Z

🤖: Benchmark completed

Details

group                                main                                   pushdown-v4
-----                                ----                                   -----------
arrow_reader_clickbench/async/Q1     1.00      2.4±0.02ms        ? ?/sec    1.00      2.4±0.12ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     10.4±0.11ms        ? ?/sec    1.10     11.5±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     12.4±0.14ms        ? ?/sec    1.09     13.5±0.18ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.34     34.4±0.29ms        ? ?/sec    1.00     25.7±0.22ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.23     48.6±0.32ms        ? ?/sec    1.00     39.5±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.24     46.3±0.35ms        ? ?/sec    1.00     37.2±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.2±0.05ms        ? ?/sec    1.08      5.6±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.32    161.7±0.73ms        ? ?/sec    1.00    122.3±0.50ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.30    207.7±0.83ms        ? ?/sec    1.00    159.6±0.65ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.06    479.2±2.17ms        ? ?/sec    1.00    450.6±8.27ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.13   492.5±12.42ms        ? ?/sec    1.00   436.3±14.78ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.21     53.8±0.69ms        ? ?/sec    1.00     44.3±0.41ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.52    163.9±0.89ms        ? ?/sec    1.00    107.7±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.45    160.0±0.86ms        ? ?/sec    1.00    110.3±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     61.5±0.37ms        ? ?/sec    1.00     61.6±0.37ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.33    169.1±0.95ms        ? ?/sec    1.00    127.2±0.54ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.01    100.1±0.47ms        ? ?/sec    1.00     98.7±0.39ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     39.6±0.23ms        ? ?/sec    1.00     39.5±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.02     49.6±0.20ms        ? ?/sec    1.00     48.9±0.43ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.05     54.1±0.36ms        ? ?/sec    1.00     51.7±0.44ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     40.8±0.26ms        ? ?/sec    1.01     41.1±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.5±0.12ms        ? ?/sec    1.00     14.5±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.2±0.00ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00      9.2±0.09ms        ? ?/sec    1.01      9.3±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.1±0.07ms        ? ?/sec    1.01     11.2±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     36.4±0.28ms        ? ?/sec    1.00     36.4±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     49.9±0.41ms        ? ?/sec    1.00     49.9±0.38ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     47.9±0.28ms        ? ?/sec    1.01     48.2±0.38ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.02      4.3±0.02ms        ? ?/sec    1.00      4.2±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.01    178.1±0.90ms        ? ?/sec    1.00    176.8±0.72ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    233.1±2.45ms        ? ?/sec    1.00    233.5±0.83ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.01    479.4±2.39ms        ? ?/sec    1.00    476.4±2.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.02   443.9±12.86ms        ? ?/sec    1.00   435.5±16.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     51.0±0.52ms        ? ?/sec    1.01     51.7±0.65ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    153.9±0.61ms        ? ?/sec    1.00    153.3±0.68ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.01    150.4±0.65ms        ? ?/sec    1.00    149.2±0.86ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.01     59.3±0.40ms        ? ?/sec    1.00     58.9±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.01    158.6±1.04ms        ? ?/sec    1.00    157.7±0.94ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.01     93.2±0.44ms        ? ?/sec    1.00     92.5±0.42ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     31.9±0.20ms        ? ?/sec    1.01     32.2±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.01     34.7±0.41ms        ? ?/sec    1.00     34.3±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     50.4±0.47ms        ? ?/sec    1.00     50.5±0.48ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     37.7±0.37ms        ? ?/sec    1.01     38.0±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.6±0.07ms        ? ?/sec    1.01     13.7±0.09ms        ? ?/sec

alamb · 2025-07-03T13:20:06Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing pushdown-v4 (1851f0b) to af8564f diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=pushdown-v4
Results will be posted here when complete

alamb · 2025-07-03T13:58:24Z

🤖: Benchmark completed

😎 -- very nice

zhuqi-lucas · 2025-07-03T14:02:26Z

🤖: Benchmark completed

😎 -- very nice

Great result!

I am curious about the performance compared with no filter pushdown case, because previous try will also improve the performance for this benchmark. But compared to the no filter pushdown case, it has some regression.

alamb · 2025-07-03T14:04:21Z

I am curious about the performance compared with no filter pushdown case, because previous try will also improve the performance for this benchmark. But compared to the no filter pushdown case, it has some regression.

I will try and run this experiment later today

zhuqi-lucas · 2025-07-03T14:07:56Z

I am curious about the performance compared with no filter pushdown case, because previous try will also improve the performance for this benchmark. But compared to the no filter pushdown case, it has some regression.

I will try and run this experiment later today

Thank you @alamb , if it has no regression, i believe this PR will also resolve the adaptive selection cases, if it has regression, we can further combine the adaptive selection for final optimization.

alamb · 2025-07-03T14:36:06Z

🤖: Benchmark completed

Details

group                                                                                                      main                                   pushdown-v4
-----                                                                                                      ----                                   -----------
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.06   1356.3±2.84µs        ? ?/sec    1.00   1277.4±2.92µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.02   1352.0±2.48µs        ? ?/sec    1.00   1323.1±3.61µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.06   1361.7±3.15µs        ? ?/sec    1.00   1283.6±2.09µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.00    484.4±6.57µs        ? ?/sec    1.06    512.0±4.35µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.00    662.9±2.03µs        ? ?/sec    1.05    694.0±2.13µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.00    485.8±3.76µs        ? ?/sec    1.05    509.5±4.37µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.09    626.7±3.48µs        ? ?/sec    1.00    577.1±3.17µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.01    772.8±2.90µs        ? ?/sec    1.00    763.2±2.98µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.07    632.7±2.73µs        ? ?/sec    1.00    590.5±4.25µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.03    258.8±3.21µs        ? ?/sec    1.00    251.7±2.83µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.17    269.3±0.80µs        ? ?/sec    1.00    230.1±0.60µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    257.7±2.56µs        ? ?/sec    1.00    258.5±3.28µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.00    309.6±1.51µs        ? ?/sec    1.00    311.1±2.30µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.00    301.0±0.54µs        ? ?/sec    1.07    321.4±0.61µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.13    306.2±1.12µs        ? ?/sec    1.00    269.9±1.09µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.00    317.2±1.37µs        ? ?/sec    1.00    318.4±1.88µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.01   1077.6±2.48µs        ? ?/sec    1.00   1066.7±1.91µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.05    951.0±2.12µs        ? ?/sec    1.00    902.7±2.82µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.01   1083.5±2.79µs        ? ?/sec    1.00   1074.1±4.83µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.04    448.4±3.42µs        ? ?/sec    1.00    432.8±4.39µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.11    630.6±1.87µs        ? ?/sec    1.00    567.9±4.22µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.04    457.8±4.89µs        ? ?/sec    1.00    438.3±3.40µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    153.1±0.31µs        ? ?/sec    1.05    160.6±0.29µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.19    297.8±0.69µs        ? ?/sec    1.00    249.8±0.82µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    158.7±0.36µs        ? ?/sec    1.05    166.4±1.13µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.00     77.3±0.22µs        ? ?/sec    1.00     77.2±0.19µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.25    257.7±0.48µs        ? ?/sec    1.00    206.9±0.37µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.02     83.5±0.22µs        ? ?/sec    1.00     82.0±3.11µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00    686.9±1.54µs        ? ?/sec    1.08    740.3±4.00µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.02    561.3±1.29µs        ? ?/sec    1.00    550.5±1.88µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    693.1±1.30µs        ? ?/sec    1.08    747.3±2.10µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.00     65.1±4.91µs        ? ?/sec    1.07     69.3±4.01µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.19    254.1±3.38µs        ? ?/sec    1.00    214.4±1.60µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.00     71.5±3.59µs        ? ?/sec    1.07     76.4±4.51µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     86.3±0.17µs        ? ?/sec    1.09     94.4±0.72µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.26    228.6±0.89µs        ? ?/sec    1.00    181.1±0.37µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     91.0±0.29µs        ? ?/sec    1.09     99.2±0.27µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.00      9.3±0.11µs        ? ?/sec    1.02      9.5±0.23µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.37    190.3±0.85µs        ? ?/sec    1.00    138.5±0.26µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.00     14.6±0.24µs        ? ?/sec    1.02     14.9±0.39µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    170.2±0.42µs        ? ?/sec    1.08    184.4±0.56µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.27    349.1±0.82µs        ? ?/sec    1.00    275.7±0.70µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    175.8±0.44µs        ? ?/sec    1.08    189.6±0.51µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.00     12.9±0.26µs        ? ?/sec    1.14     14.7±0.42µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.41    267.4±0.67µs        ? ?/sec    1.00    190.2±0.58µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     20.0±0.74µs        ? ?/sec    1.00     20.0±0.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    340.8±0.84µs        ? ?/sec    1.07    365.3±0.82µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.08    376.1±1.45µs        ? ?/sec    1.00    348.3±0.85µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    347.6±1.68µs        ? ?/sec    1.07    371.8±0.92µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.00     26.0±0.54µs        ? ?/sec    1.17     30.3±1.95µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.22    219.8±0.58µs        ? ?/sec    1.00    179.7±0.58µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.00     32.6±0.53µs        ? ?/sec    1.09     35.5±1.36µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    120.2±0.20µs        ? ?/sec    1.01    121.8±0.18µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    135.7±0.53µs        ? ?/sec    1.02    138.6±0.32µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    123.1±0.19µs        ? ?/sec    1.02    126.1±0.26µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.01    174.1±0.60µs        ? ?/sec    1.00    171.8±0.28µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.00    230.2±0.68µs        ? ?/sec    1.01    232.8±0.70µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.01    179.4±0.43µs        ? ?/sec    1.00    177.0±0.46µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00     77.2±0.20µs        ? ?/sec    1.01     78.0±0.68µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    178.9±0.83µs        ? ?/sec    1.01    181.2±1.04µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.01     82.3±0.31µs        ? ?/sec    1.00     81.8±0.26µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    138.4±0.42µs        ? ?/sec    1.06    147.0±0.36µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    213.4±0.55µs        ? ?/sec    1.03    219.8±0.91µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    143.6±0.28µs        ? ?/sec    1.06    152.8±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     74.6±0.44µs        ? ?/sec    1.00     74.6±0.30µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.00    177.5±0.71µs        ? ?/sec    1.01    179.7±0.46µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.00     78.4±0.22µs        ? ?/sec    1.01     79.5±0.26µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    113.8±0.15µs        ? ?/sec    1.01    114.9±0.18µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    140.0±0.32µs        ? ?/sec    1.03    144.7±0.64µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    116.7±0.13µs        ? ?/sec    1.02    119.6±0.57µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    171.7±0.63µs        ? ?/sec    1.02    175.7±0.48µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.00    249.4±0.59µs        ? ?/sec    1.02    253.6±0.63µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.00    176.6±0.51µs        ? ?/sec    1.03    181.6±0.73µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00    202.6±0.43µs        ? ?/sec    1.00    203.3±0.29µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    263.1±0.57µs        ? ?/sec    1.00    263.6±0.81µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    209.1±0.51µs        ? ?/sec    1.01    210.2±0.56µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    145.9±0.34µs        ? ?/sec    1.07    156.7±0.30µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    230.6±0.61µs        ? ?/sec    1.03    236.8±0.62µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    151.3±0.34µs        ? ?/sec    1.06    159.9±0.96µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     97.6±0.97µs        ? ?/sec    1.11    108.3±0.72µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.00    208.6±1.32µs        ? ?/sec    1.03    214.8±0.91µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    107.3±2.25µs        ? ?/sec    1.15    123.3±1.20µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                                      1.00     95.6±0.12µs        ? ?/sec    1.04     99.4±0.25µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs                                     1.00    113.9±0.18µs        ? ?/sec    1.02    116.2±0.46µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                                       1.00     98.6±0.22µs        ? ?/sec    1.04    102.3±0.33µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                                           1.00    130.9±0.37µs        ? ?/sec    1.05    138.0±0.77µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                                          1.00    189.6±0.46µs        ? ?/sec    1.03    194.5±0.29µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                                            1.00    135.5±0.33µs        ? ?/sec    1.06    143.0±0.58µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     44.4±0.11µs        ? ?/sec    1.01     44.9±0.11µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs                              1.00    143.4±0.29µs        ? ?/sec    1.01    144.4±1.68µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs                                1.00     48.6±0.12µs        ? ?/sec    1.01     49.2±0.17µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs                                      1.00    104.6±0.17µs        ? ?/sec    1.09    114.4±0.27µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs                                     1.00    177.8±0.47µs        ? ?/sec    1.03    182.6±2.84µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs                                       1.00    109.4±0.22µs        ? ?/sec    1.09    119.6±3.64µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                                           1.00     38.9±0.14µs        ? ?/sec    1.00     38.8±0.08µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                                          1.00    141.4±0.38µs        ? ?/sec    1.00    140.8±1.42µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                                            1.01     43.8±0.19µs        ? ?/sec    1.00     43.5±0.22µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.00     94.6±0.20µs        ? ?/sec    1.02     96.1±0.21µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00    108.9±0.32µs        ? ?/sec    1.02    110.9±0.92µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.00     98.2±0.32µs        ? ?/sec    1.01     98.7±0.23µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.00    121.4±0.27µs        ? ?/sec    1.00    121.0±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.00    174.6±0.69µs        ? ?/sec    1.02    177.9±0.35µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    125.8±0.44µs        ? ?/sec    1.00    126.0±0.39µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.11     26.3±0.21µs        ? ?/sec    1.00     23.7±0.06µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.00    126.0±0.27µs        ? ?/sec    1.01    127.5±0.31µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.00     30.1±0.25µs        ? ?/sec    1.03     31.1±0.19µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.00     87.2±0.26µs        ? ?/sec    1.11     96.5±0.28µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.00    157.0±0.39µs        ? ?/sec    1.04    163.6±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.00     91.1±0.36µs        ? ?/sec    1.12    101.7±0.40µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.00     18.2±0.22µs        ? ?/sec    1.01     18.4±0.39µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.00    122.0±0.34µs        ? ?/sec    1.01    123.1±0.45µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.01     24.9±0.49µs        ? ?/sec    1.00     24.8±0.43µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.00     87.0±0.43µs        ? ?/sec    1.02     88.4±0.67µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.00    112.3±0.35µs        ? ?/sec    1.00    111.9±0.36µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.00     89.3±0.27µs        ? ?/sec    1.01     90.6±0.31µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.00    117.9±0.65µs        ? ?/sec    1.04    122.6±0.58µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.00    186.8±0.63µs        ? ?/sec    1.03    193.3±0.82µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.00    120.7±0.60µs        ? ?/sec    1.05    127.3±3.66µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.01    151.7±0.32µs        ? ?/sec    1.00    149.8±0.46µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.01    209.7±0.70µs        ? ?/sec    1.00    207.1±1.72µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.01    156.7±0.39µs        ? ?/sec    1.00    154.5±0.26µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.00     93.1±0.46µs        ? ?/sec    1.09    101.7±0.58µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.02    182.8±0.54µs        ? ?/sec    1.00    179.1±0.71µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00     97.7±0.52µs        ? ?/sec    1.10    107.5±2.91µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.00     42.5±0.65µs        ? ?/sec    1.12     47.7±1.88µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.00    150.0±0.71µs        ? ?/sec    1.00    150.5±1.19µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.00     47.0±0.68µs        ? ?/sec    1.14     53.7±1.88µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                                       1.00     92.3±0.17µs        ? ?/sec    1.01     93.3±0.21µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                                      1.00    110.0±0.61µs        ? ?/sec    1.01    111.2±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                                        1.00     95.1±0.17µs        ? ?/sec    1.01     96.3±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                                            1.01    123.0±0.28µs        ? ?/sec    1.00    122.4±0.61µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                                           1.00    182.0±1.07µs        ? ?/sec    1.00    182.3±0.35µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                                             1.00    127.3±0.44µs        ? ?/sec    1.00    126.9±1.12µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs                                1.00     36.9±0.12µs        ? ?/sec    1.00     37.0±0.07µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs                               1.01    136.8±0.48µs        ? ?/sec    1.00    135.7±0.34µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs                                 1.00     41.0±0.32µs        ? ?/sec    1.01     41.4±0.10µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs                                       1.00     96.6±0.20µs        ? ?/sec    1.11    106.9±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs                                      1.00    170.4±0.44µs        ? ?/sec    1.03    175.1±1.72µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs                                        1.00    101.3±0.25µs        ? ?/sec    1.10    111.6±0.80µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                                            1.00     31.2±0.12µs        ? ?/sec    1.00     31.1±0.07µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                                           1.00    133.7±0.58µs        ? ?/sec    1.00    133.3±0.23µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                                             1.00     35.5±0.20µs        ? ?/sec    1.01     36.0±0.11µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.01      7.2±0.04ms        ? ?/sec    1.00      7.1±0.04ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.01     13.3±0.11ms        ? ?/sec    1.00     13.2±0.16ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.00    495.7±3.68µs        ? ?/sec    1.04    513.4±2.64µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.00    665.7±5.16µs        ? ?/sec    1.04    694.8±1.99µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.00    498.7±3.42µs        ? ?/sec    1.02    510.0±3.08µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.20    726.9±3.72µs        ? ?/sec    1.00    607.3±3.10µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.03    817.4±3.99µs        ? ?/sec    1.00    796.7±7.55µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.19    732.6±2.67µs        ? ?/sec    1.00    615.7±3.29µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.01    322.0±1.12µs        ? ?/sec    1.00    320.3±1.68µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.00    401.2±1.26µs        ? ?/sec    1.08    432.0±2.30µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.01    328.1±1.32µs        ? ?/sec    1.00    326.5±1.63µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.02    259.4±2.77µs        ? ?/sec    1.00    255.2±2.32µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.15    277.3±0.65µs        ? ?/sec    1.00    240.4±0.67µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    265.7±2.52µs        ? ?/sec    1.01    269.6±2.35µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.03    383.8±1.92µs        ? ?/sec    1.00    372.4±1.34µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.13    339.4±1.33µs        ? ?/sec    1.00    301.3±1.63µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.03    395.0±6.05µs        ? ?/sec    1.00    385.3±2.38µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs                                     1.00    102.2±0.19µs        ? ?/sec    1.00    101.8±0.23µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs                                    1.00    118.1±0.29µs        ? ?/sec    1.00    117.6±1.43µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                                      1.01    105.0±0.34µs        ? ?/sec    1.00    104.1±0.24µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                                          1.01    139.8±0.27µs        ? ?/sec    1.00    139.0±0.19µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                                         1.00    195.4±0.41µs        ? ?/sec    1.00    194.8±0.63µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                                           1.01    144.3±0.30µs        ? ?/sec    1.00    143.5±0.95µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs                              1.04     44.6±0.12µs        ? ?/sec    1.00     43.0±0.10µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs                             1.01    144.2±1.16µs        ? ?/sec    1.00    143.2±1.37µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs                               1.03     49.0±0.13µs        ? ?/sec    1.00     47.6±0.15µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs                                     1.00    104.5±0.28µs        ? ?/sec    1.10    114.6±0.48µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs                                    1.00    178.3±1.76µs        ? ?/sec    1.02    182.6±1.21µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs                                      1.00    109.3±0.70µs        ? ?/sec    1.09    119.2±0.46µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                                          1.01     39.2±0.31µs        ? ?/sec    1.00     38.9±0.09µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                                         1.01    142.2±3.14µs        ? ?/sec    1.00    140.9±0.58µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                                           1.00     43.1±0.09µs        ? ?/sec    1.01     43.7±0.13µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs                                     1.00     94.5±0.13µs        ? ?/sec    1.02     96.4±1.15µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs                                    1.00    109.5±0.24µs        ? ?/sec    1.01    110.2±0.31µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                                      1.00     97.3±0.22µs        ? ?/sec    1.01     98.7±0.21µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                                          1.02    123.7±0.55µs        ? ?/sec    1.00    121.2±0.31µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                                         1.00    177.5±0.37µs        ? ?/sec    1.00    177.0±0.41µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                                           1.02    128.0±0.69µs        ? ?/sec    1.00    125.8±0.41µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     27.1±0.35µs        ? ?/sec    1.00     27.0±0.22µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, half NULLs                             1.01    128.1±1.29µs        ? ?/sec    1.00    126.5±0.36µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, no NULLs                               1.00     31.5±0.33µs        ? ?/sec    1.00     31.5±0.44µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, mandatory, no NULLs                                     1.00     87.1±0.42µs        ? ?/sec    1.11     96.8±0.35µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, half NULLs                                    1.00    161.0±0.23µs        ? ?/sec    1.02    164.3±0.48µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, no NULLs                                      1.00     91.8±0.28µs        ? ?/sec    1.10    101.3±0.70µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs                                          1.00     21.5±0.40µs        ? ?/sec    1.02     21.8±0.57µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs                                         1.00    123.9±0.57µs        ? ?/sec    1.00    124.2±0.41µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs                                           1.00     26.3±0.38µs        ? ?/sec    1.01     26.6±0.37µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs                                     1.00     87.1±0.25µs        ? ?/sec    1.03     89.4±0.36µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs                                    1.00    112.4±0.44µs        ? ?/sec    1.01    113.0±1.44µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                                      1.00     89.3±0.27µs        ? ?/sec    1.03     92.2±0.34µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                                          1.00    118.0±0.64µs        ? ?/sec    1.03    121.6±0.59µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                                         1.00    186.2±0.50µs        ? ?/sec    1.05    195.5±0.49µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                                           1.00    120.6±0.44µs        ? ?/sec    1.04    125.3±0.44µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, mandatory, no NULLs                              1.01    151.9±0.56µs        ? ?/sec    1.00    150.7±0.38µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, half NULLs                             1.01    207.6±1.85µs        ? ?/sec    1.00    205.1±0.74µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, no NULLs                               1.01    156.8±0.44µs        ? ?/sec    1.00    155.8±1.35µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, mandatory, no NULLs                                     1.00     93.6±0.75µs        ? ?/sec    1.09    102.2±0.67µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, half NULLs                                    1.02    182.2±0.36µs        ? ?/sec    1.00    178.6±0.45µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, no NULLs                                      1.00     97.3±0.73µs        ? ?/sec    1.11    107.6±0.74µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, mandatory, no NULLs                                          1.00     43.9±0.70µs        ? ?/sec    1.05     46.3±2.01µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, half NULLs                                         1.01    150.1±1.21µs        ? ?/sec    1.00    149.2±0.87µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, no NULLs                                           1.00     50.0±1.04µs        ? ?/sec    1.06     53.0±2.24µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                                      1.01    100.9±0.19µs        ? ?/sec    1.00    100.0±0.22µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs                                     1.00    114.5±0.43µs        ? ?/sec    1.00    114.9±0.29µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                                       1.01    103.5±0.33µs        ? ?/sec    1.00    102.4±0.46µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                                           1.02    132.4±0.16µs        ? ?/sec    1.00    130.2±0.28µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                                          1.00    186.6±0.43µs        ? ?/sec    1.00    187.0±0.46µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                                            1.01    137.0±0.70µs        ? ?/sec    1.00    135.1±0.99µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     35.1±0.08µs        ? ?/sec    1.03     36.2±0.16µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs                              1.01    136.9±0.39µs        ? ?/sec    1.00    136.2±1.12µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs                                1.00     39.7±0.12µs        ? ?/sec    1.04     41.2±0.20µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs                                      1.00     97.1±0.52µs        ? ?/sec    1.10    106.7±0.19µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs                                     1.00    170.1±1.99µs        ? ?/sec    1.03    174.9±0.37µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs                                       1.00    101.6±0.18µs        ? ?/sec    1.10    111.5±0.34µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                                           1.00     30.5±0.27µs        ? ?/sec    1.02     31.0±0.16µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                                          1.01    133.3±0.83µs        ? ?/sec    1.00    132.6±0.40µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                                            1.00     35.1±0.25µs        ? ?/sec    1.01     35.6±0.37µs        ? ?/sec

XiangpengHao · 2025-07-15T22:47:13Z

So one thing I didn't understand after reading this PR in detail was how the relative row positions are updated after applying a filter.

Thank you for reviewing this @alamb !

This is place we do merge back filter selection to original selection: https://github.com/XiangpengHao/arrow-rs/blob/5537bcb0870ba21549e72b58b65237ba823eec50/parquet/src/arrow/arrow_reader/read_plan.rs#L119

i.e., this pr does not change how we represent selections, we still use the existing implementation, the only difference is that we added a transparent cache layer, rest of the code should all be the same.

alamb · 2025-07-16T13:27:35Z

I found another bug in DataFusion testing here apache/datafusion#16711 (comment)

alamb · 2025-07-16T13:28:29Z

I am now working on additional review / proposed improvements to this PR -- basically to structure the caching more into the PlanBuilder and make it easier to test

alamb · 2025-07-16T14:46:02Z

Here is a proposal:

Simplify projection caching XiangpengHao/arrow-rs#5

(I think the CI is having issues due to https://www.githubstatus.com/incidents/k20s3qvr28zw)

alamb · 2025-07-16T14:48:23Z

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing pushdown-v4 (b835163) to c40830e diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=pushdown-v4
Results will be posted here when complete

alamb · 2025-07-16T15:14:35Z

🤖: Benchmark completed

Details

group                                main                                   pushdown-v4
-----                                ----                                   -----------
arrow_reader_clickbench/async/Q1     1.02      2.4±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     11.3±0.13ms        ? ?/sec    1.07     12.0±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     13.3±0.20ms        ? ?/sec    1.03     13.7±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.39     35.6±0.32ms        ? ?/sec    1.00     25.6±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.26     49.7±0.47ms        ? ?/sec    1.00     39.6±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.27     47.0±0.30ms        ? ?/sec    1.00     36.9±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.2±0.07ms        ? ?/sec    1.11      5.8±0.15ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.33    161.3±1.22ms        ? ?/sec    1.00    121.6±0.61ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.29    207.6±1.10ms        ? ?/sec    1.00    160.3±1.21ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    405.6±2.49ms        ? ?/sec    1.00    406.4±5.58ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.18    496.8±5.89ms        ? ?/sec    1.00   422.8±12.00ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.24     55.4±0.66ms        ? ?/sec    1.00     44.6±0.61ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.53    165.6±1.38ms        ? ?/sec    1.00    108.0±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.45    162.6±1.54ms        ? ?/sec    1.00    112.5±0.85ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     62.3±0.73ms        ? ?/sec    1.00     62.7±0.50ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.32    169.0±1.77ms        ? ?/sec    1.00    128.1±0.56ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00    100.5±0.54ms        ? ?/sec    1.00    100.8±0.75ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     39.8±0.28ms        ? ?/sec    1.01     40.2±0.23ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.01     50.0±0.43ms        ? ?/sec    1.00     49.5±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.03     53.8±0.48ms        ? ?/sec    1.00     52.3±0.44ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     41.2±0.40ms        ? ?/sec    1.01     41.5±0.33ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.4±0.14ms        ? ?/sec    1.03     14.8±0.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.2±0.02ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.03      9.8±0.06ms        ? ?/sec    1.00      9.5±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.5±0.07ms        ? ?/sec    1.00     11.5±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.02     37.9±0.32ms        ? ?/sec    1.00     37.2±0.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     50.8±0.45ms        ? ?/sec    1.01     51.2±0.52ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     48.6±0.31ms        ? ?/sec    1.02     49.3±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.03      4.4±0.03ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    176.7±1.29ms        ? ?/sec    1.01    178.1±1.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    234.8±1.84ms        ? ?/sec    1.01    236.9±1.80ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    479.6±4.95ms        ? ?/sec    1.01    482.4±3.57ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.04   443.1±17.45ms        ? ?/sec    1.00    426.0±5.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     53.0±0.74ms        ? ?/sec    1.01     53.3±0.96ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    154.6±1.23ms        ? ?/sec    1.00    154.2±0.95ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    153.2±1.41ms        ? ?/sec    1.01    154.5±1.53ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     59.7±0.41ms        ? ?/sec    1.01     60.1±0.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    158.8±1.01ms        ? ?/sec    1.01    160.1±1.63ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.01     94.3±0.74ms        ? ?/sec    1.00     93.8±0.53ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     32.0±0.21ms        ? ?/sec    1.01     32.3±0.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     34.9±0.48ms        ? ?/sec    1.01     35.3±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     50.6±0.50ms        ? ?/sec    1.01     51.0±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     38.1±0.38ms        ? ?/sec    1.01     38.7±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.7±0.13ms        ? ?/sec    1.01     13.8±0.12ms        ? ?/sec

alamb · 2025-07-16T15:33:14Z

Here is another proposed addition

Move cache options construction to ArrayReaderBuilder, add builders XiangpengHao/arrow-rs#6

alamb · 2025-07-16T15:42:05Z

Summary so far (I now need to go work on some other things for the rest of the day):

I made two proposed changes

Simplify projection caching XiangpengHao/arrow-rs#5
Move cache options construction to ArrayReaderBuilder, add builders XiangpengHao/arrow-rs#6
I found a bug in this code via the DataFusion code (see POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) datafusion#16711 (comment))

My plan for tomorrow will be to try and write some tests:

Reproduce the bug / error in an arrow-rs only test
Write some sort of integration test that shows the cache working (in preparation for wiring in the memory limit)

XiangpengHao · 2025-07-17T03:46:15Z

Summary so far (I now need to go work on some other things for the rest of the day):

I made two proposed changes

Simplify projection caching XiangpengHao/arrow-rs#5

Move cache options construction to ArrayReaderBuilder, add builders XiangpengHao/arrow-rs#6

I found a bug in this code via the DataFusion code (see POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) datafusion#16711 (comment))

My plan for tomorrow will be to try and write some tests:

Reproduce the bug / error in an arrow-rs only test

Write some sort of integration test that shows the cache working (in preparation for wiring in the memory limit)

Thank you for the review @alamb , I plan to take a look on this in the next few days, and also think about further optimizations.

Maybe it's just me, but I can't reproduce some of the regressions reported in datafusion integrations, I'll get a "cloud" machine and try again.

alamb · 2025-07-17T15:28:31Z

Maybe it's just me, but I can't reproduce some of the regressions reported in datafusion integrations, I'll get a "cloud" machine and try again.

If we can't reproduce them I think we should just ignore it

…ushdown-v4

Simplify projection caching

Move cache options construction to ArrayReaderBuilder, add builders

XiangpengHao · 2025-07-17T18:31:17Z

Summary for new updates:

incorporated the changes from @alamb
added a test case to reproduce error from POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) datafusion#16711 (comment)
fixed the above bug
added a slightly more accurate memory accounting for string view arrays

alamb · 2025-07-19T11:15:35Z

Thank you -- I will get back to this tomorrow or Monday

alamb · 2025-07-21T11:42:59Z

I am beginning to look into this -- my planned contribution is to

Make a setting for max cache size (which we will need as a escape valve to turn this off)
Tests for cache memory size

alamb · 2025-07-21T15:49:02Z

I am beginning to look into this -- my planned contribution is to

Make a setting for max cache size (which we will need as a escape valve to turn this off)

Tests for cache memory size

I started writing some tests but it got somewhat more complicated than I expected. Here is the WIP PR

WIP: [Parquet] Add tests for IO/CPU access in parquet reader #7971

Once that is in place then I hope to use the same pattern to verify the cache operations. I will continue tomorrow

XiangpengHao added 10 commits July 1, 2025 11:18

update

1f93a93

update

2e01e56

update

0bd08c3

update

d6ecbd4

cleanup

7cd5518

update

4520048

update

e6281bc

update

6b6d4fc

update

b696b66

update

f60581f

github-actions bot added the parquet Changes to the parquet crate label Jul 2, 2025

XiangpengHao commented Jul 2, 2025

View reviewed changes

XiangpengHao changed the title ~~Pushdown v4~~ Parquet filter pushdown v4 Jul 2, 2025

XiangpengHao commented Jul 2, 2025

View reviewed changes

clippy and license

1851f0b

alamb reviewed Jul 3, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

alamb added 2 commits July 16, 2025 09:30

fmt

b835163

Simplify projection caching

5132de8

alamb mentioned this pull request Jul 16, 2025

Simplify projection caching XiangpengHao/arrow-rs#5

Merged

Move cache options construction to ArrayReaderBuilder, add builders

253dad3

alamb mentioned this pull request Jul 16, 2025

Move cache options construction to ArrayReaderBuilder, add builders XiangpengHao/arrow-rs#6

Merged

XiangpengHao and others added 8 commits July 17, 2025 12:08

update memory accounting

5d9781e

Merge remote-tracking branch 'refs/remotes/origin/pushdown-v4' into p…

2e20902

…ushdown-v4

Merge pull request #5 from alamb/alamb/simplify_cache

721d00c

Simplify projection caching

Merge pull request #6 from alamb/alamb/cleaner_api

f8aed80

Move cache options construction to ArrayReaderBuilder, add builders

update

884b591

array size

4f6b918

add test case

6c53bfd

fix bug

8ebe579

clippy & fmt

c240a52

alamb added a commit to alamb/datafusion that referenced this pull request Jul 21, 2025

Patch to experimental decoder apache/arrow-rs#7850

747d500

alamb mentioned this pull request Jul 21, 2025

WIP: [Parquet] Add tests for IO/CPU access in parquet reader #7971

Draft

	fn compute_cache_projection(&self, projection: &ProjectionMask) -> Option<ProjectionMask> {
	/// Compute which columns are used in filters and the final (output) projection
	fn compute_cache_projection(&self, projection: &ProjectionMask) -> Option<ProjectionMask> {


		let start_position = self.outer_position - row_count;

		let selection_buffer = row_selection_to_boolean_buffer(row_count, self.selections.iter());

Parquet filter pushdown v4 #7850

Are you sure you want to change the base?

Parquet filter pushdown v4 #7850

Uh oh!

Conversation

XiangpengHao commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problems of #6921

How it works?

Other benefits

How does it perform?

Limitations

Next steps?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jul 2, 2025

Uh oh!

zhuqi-lucas commented Jul 3, 2025

Uh oh!

alamb left a comment

Choose a reason for hiding this comment

Buffering

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

zhuqi-lucas commented Jul 3, 2025

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

zhuqi-lucas commented Jul 3, 2025

Uh oh!

This comment was marked as resolved.

alamb commented Jul 3, 2025

Uh oh!

This comment was marked as resolved.

XiangpengHao commented Jul 15, 2025

Uh oh!

alamb commented Jul 16, 2025

Uh oh!

alamb commented Jul 16, 2025

Uh oh!

alamb commented Jul 16, 2025

Uh oh!

alamb commented Jul 16, 2025

Uh oh!

alamb commented Jul 16, 2025

Uh oh!

alamb commented Jul 16, 2025

Uh oh!

alamb commented Jul 16, 2025

Uh oh!

XiangpengHao commented Jul 2, 2025 •

edited

Loading

alamb Jul 15, 2025 •

edited

Loading

alamb commented Jul 21, 2025 •

edited

Loading