Skip to content

Parquet filter pushdown v4 #7850

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

Conversation

XiangpengHao
Copy link
Contributor

@XiangpengHao XiangpengHao commented Jul 2, 2025

This is my latest attempt to make pushdown faster. Prior art: #6921

cc @alamb @zhuqi-lucas

Problems of #6921

  1. It proactively loads entire row group into memory. (rather than only loading pages that passing the filter predicate)
  2. It only cache decompressed pages, still paying the decoding cost twice.

This PR takes a different approach, it does not change the decoding pipeline, so we avoid the problem 1. It also caches the arrow record batch, so avoid problem 2.

But this means we need to use more memory to cache data.

How it works?

  1. It instruments the array_readers with a transparent cached_array_reader.
  2. The cache layer will first consult the RowGroupCache to look for a batch, and only reads from underlying reader on a cache miss.
  3. There're cache producer and cache consumer. Producer is when we build filters we insert arrow arrays into cache, consumer is when we build outputs, we remove arrow array from cache. So the memory usage should look like this:
    ▲
    │     ╭─╮
    │    ╱   ╲
    │   ╱     ╲
    │  ╱       ╲
    │ ╱         ╲
    │╱           ╲
    └─────────────╲──────► Time
    │      │      │
    Filter  Peak  Consume
    Phase (Built) (Decrease)

In a concurrent setup, not all reader may reach the peak point at the same time, so the peak system memory usage might be lower.

  1. It has a max_cache_size knob, this is a per row group setting. If the row group has used up the budget, the cache stops taking new data. and the cached_array_reader will fallback to read and decode from Parquet.

Other benefits

  1. This architecture allows nested columns (but not implemented in this pr), i.e., it's future proof.
  2. There're many performance optimizations to further squeeze the performance, but even with current state, it has no regressions.

How does it perform?

My criterion somehow won't produces a result from --save-baseline, so I asked llm to generate a table from this benchmark:

cargo bench --bench arrow_reader_clickbench --features "arrow async" "async"

Baseline is the implementation for current main branch.
New Unlimited is the new pushdown with unlimited memory budget.
New 100MB is the new pushdown but the memory budget for a row group caching is 100MB.

Query  | Baseline (ms) | New Unlimited (ms) | Diff (ms)  | New 100MB (ms) | Diff (ms)
-------+--------------+--------------------+-----------+----------------+-----------
Q1     | 0.847          | 0.803               | -0.044     | 0.812          | -0.035    
Q10    | 4.060          | 6.273               | +2.213     | 6.216          | +2.156    
Q11    | 5.088          | 7.152               | +2.064     | 7.193          | +2.105    
Q12    | 18.485         | 14.937              | -3.548     | 14.904         | -3.581    
Q13    | 24.859         | 21.908              | -2.951     | 21.705         | -3.154    
Q14    | 23.994         | 20.691              | -3.303     | 20.467         | -3.527    
Q19    | 1.894          | 1.980               | +0.086     | 1.996          | +0.102    
Q20    | 90.325         | 64.689              | -25.636    | 74.478         | -15.847   
Q21    | 106.610        | 74.766              | -31.844    | 99.557         | -7.053    
Q22    | 232.730        | 101.660             | -131.070   | 204.800        | -27.930   
Q23    | 222.800        | 186.320             | -36.480    | 186.590        | -36.210   
Q24    | 24.840         | 19.762              | -5.078     | 19.908         | -4.932    
Q27    | 80.463         | 47.118              | -33.345    | 49.597         | -30.866   
Q28    | 78.999         | 47.583              | -31.416    | 51.432         | -27.567   
Q30    | 28.587         | 28.710              | +0.123     | 28.926         | +0.339    
Q36    | 80.157         | 57.954              | -22.203    | 58.012         | -22.145   
Q37    | 46.962         | 45.901              | -1.061     | 45.386         | -1.576    
Q38    | 16.324         | 16.492              | +0.168     | 16.522         | +0.198    
Q39    | 20.754         | 20.734              | -0.020     | 20.648         | -0.106    
Q40    | 22.554         | 21.707              | -0.847     | 21.995         | -0.559    
Q41    | 16.430         | 16.391              | -0.039     | 16.581         | +0.151    
Q42    | 6.045          | 6.157               | +0.112     | 6.120          | +0.075    
  1. If we consider the diff within 5ms to be noise, then we are never worse than the current implementation.
  2. We see significant improvements for string-heavy queries, because string columns are large, they take time to decompress and decode.
  3. 100MB cache budget seems to have small performance impact.

Limitations

  1. It only works for async readers, because sync reader do not follow the same row group by row group structure.
  2. It is memory hungry -- compared to Experimental parquet decoder with first-class selection pushdown support #6921. But changing decoding pipeline without eager loading entire row group would require significant changes to the current decoding infrastructure, e.g., we need to make page iterator an async function.
  3. It currently doesn't support nested columns, more specifically, it doesn't support nested columns with nullable parents. but supporting it is straightforward, no big changes.
  4. The current memory accounting is not accurate, it will overestimate the memory usage, especially when reading string view arrays, where multiple string view may share the same underlying buffer, and that buffer size is counted twice. Anyway, we never exceeds the user configured memory usage.
  5. If one row passes the filter, the entire batch will be cached. We can probably optimize this though.

Next steps?

This pr is largely proof of concept, I want to collect some feedback before sending a multi-thousands pr :)

Some items I can think of:

  1. Design an interface for user to specify the cache size limit, currently it's hard-coded.
  2. Don't instrument nested array reader if the parquet file has nullable parent. currently it will panic
  3. More testing, and integration test/benchmark with Datafusion

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jul 2, 2025
#[derive(Clone)]
pub struct CacheOptions<'a> {
pub projection_mask: &'a ProjectionMask,
pub cache: Arc<Mutex<RowGroupCache>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Practically there's no contention because there's not parallelism in decoding one row group. we add mutex here because we need to use Arc.

let row_group_cache = Arc::new(Mutex::new(RowGroupCache::new(
batch_size,
// None,
Some(1024 * 1024 * 100),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is currently hard-coded, leave it a future work to make it configurable through user settings

@XiangpengHao XiangpengHao changed the title Pushdown v4 Parquet filter pushdown v4 Jul 2, 2025
@@ -613,8 +623,18 @@ where
.fetch(&mut self.input, predicate.projection(), selection)
.await?;

let mut cache_projection = predicate.projection().clone();
cache_projection.intersect(&projection);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A column is cached if and only if it appears both in output projection and filter projection

Copy link
Contributor

@alamb alamb Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So one thing I didn't understand after reading this PR in detail was how the relative row positions are updated after applying a filter.

For example if we are applying multiple filters, the first may reduce the original RowSelection down to [100->200], and now when the second filter runs it is only evaluated on the 100->200 rows , not the original selection

In other words I think there needs to be some sort of function equvalent to RowSelection::and_then that applies to the cache

// Narrow the cache so that it only retains the results of evaluating the predicate
let row_group_cache = row_group_cache.and_then(resulting_selection)

Maybe this is the root cause of https://github.com/apache/datafusion/actions/runs/16302299778/job/46039904381?pr=16711

}

fn get_def_levels(&self) -> Option<&[i16]> {
None // we don't allow nullable parent for now.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nested columns not support yet

@alamb
Copy link
Contributor

alamb commented Jul 2, 2025

😮 -- My brain is likely too fried at the moment to review this properly but it is on my list for first thing tomorrow

@zhuqi-lucas
Copy link
Contributor

Thank you @XiangpengHao for amazing work, i will try to review and test this PR!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR is I think this is really clever - very nice @XiangpengHao . I left some structural comments / suggestions but nothing major.

I will run some more benchmarks, but it was showing very nice improvements for Q21 locally for me (129ms --> 90ms)

If that looks good I'll wire it up in DataFusion and run those benchmarks

Some thoughts:

  1. I would be happy to wire in the buffering limit / API
  2. As you say, there are many more improvements possible -- specifically I suspect the RowSelector representation is going to cause us pain and suffering for filters that have many short selections when bitmaps would be a better choice

Buffering

I think buffering the intermediate filter results is unavoidable if we want to preserve the current behavior to minimizes the size of IO requests

If we want to reduce buffering I think we can only really do it by increasing the number of IO requests (so we can incrementally produce the final output). I think we should proceed with buffering and then tune if/when needed

Comment on lines 632 to 636
CacheOptions {
projection_mask: &cache_projection,
cache: row_group_cache.clone(),
role: crate::arrow::array_reader::CacheRole::Producer,
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

structurally both here and below it might help to keep the creation ofthe CacheOptions into the cache itself so a reader of this code doesn't have to understand the innards of the cache

Suggested change
CacheOptions {
projection_mask: &cache_projection,
cache: row_group_cache.clone(),
role: crate::arrow::array_reader::CacheRole::Producer,
},
row_group_cache.producer_options(projection, predicate.proection())


let reader = ParquetRecordBatchReader::new(array_reader, plan);

Ok((self, Some(reader)))
}

fn compute_cache_projection(&self, projection: &ProjectionMask) -> Option<ProjectionMask> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fn compute_cache_projection(&self, projection: &ProjectionMask) -> Option<ProjectionMask> {
/// Compute which columns are used in filters and the final (output) projection
fn compute_cache_projection(&self, projection: &ProjectionMask) -> Option<ProjectionMask> {


let start_position = self.outer_position - row_count;

let selection_buffer = row_selection_to_boolean_buffer(row_count, self.selections.iter());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is clever -- though it will likely suffer from the same "RowSelection is a crappy representation for small selection runs" problem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is to alleviate the problem. If we have multiple small selection runs on the same cached batch, first combine them into a boolean buffer, and do boolean selection once.

pub struct CacheKey {
/// Column index in the row group
pub column_idx: usize,
/// Starting row ID for this batch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help here to clarify what these Row ids are relative to

I THINK they are the row ids relative to the underlying column reader (which might already have a RowSelection applied)

If so it would be good to clarify they are not absolute row ids from the (unfiltered) Row Group, for example

.expect("data must be already cached in the read_records call, this is a bug");
let cached = cached.slice(overlap_start - batch_start, selection_length);
let filtered = arrow_select::filter::filter(&cached, &mask_array)?;
selected_arrays.push(filtered);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can probably use the new BatchCoalescer here instead: https://docs.rs/arrow/latest/arrow/compute/struct.BatchCoalescer.html

It is definitely faster for primitive arrays and will save intermediate memory usage

It might have some trouble with StringView as it also tries to gc internally too -- we may need to optimize the output to avoid gc'ing if we see the same buffer from call to call

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing pushdown-v4 (1851f0b) to af8564f diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=pushdown-v4
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖: Benchmark completed

Details

group                                main                                   pushdown-v4
-----                                ----                                   -----------
arrow_reader_clickbench/async/Q1     1.00      2.4±0.02ms        ? ?/sec    1.00      2.4±0.12ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     10.4±0.11ms        ? ?/sec    1.10     11.5±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     12.4±0.14ms        ? ?/sec    1.09     13.5±0.18ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.34     34.4±0.29ms        ? ?/sec    1.00     25.7±0.22ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.23     48.6±0.32ms        ? ?/sec    1.00     39.5±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.24     46.3±0.35ms        ? ?/sec    1.00     37.2±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.2±0.05ms        ? ?/sec    1.08      5.6±0.07ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.32    161.7±0.73ms        ? ?/sec    1.00    122.3±0.50ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.30    207.7±0.83ms        ? ?/sec    1.00    159.6±0.65ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.06    479.2±2.17ms        ? ?/sec    1.00    450.6±8.27ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.13   492.5±12.42ms        ? ?/sec    1.00   436.3±14.78ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.21     53.8±0.69ms        ? ?/sec    1.00     44.3±0.41ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.52    163.9±0.89ms        ? ?/sec    1.00    107.7±0.60ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.45    160.0±0.86ms        ? ?/sec    1.00    110.3±0.47ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     61.5±0.37ms        ? ?/sec    1.00     61.6±0.37ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.33    169.1±0.95ms        ? ?/sec    1.00    127.2±0.54ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.01    100.1±0.47ms        ? ?/sec    1.00     98.7±0.39ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     39.6±0.23ms        ? ?/sec    1.00     39.5±0.25ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.02     49.6±0.20ms        ? ?/sec    1.00     48.9±0.43ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.05     54.1±0.36ms        ? ?/sec    1.00     51.7±0.44ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     40.8±0.26ms        ? ?/sec    1.01     41.1±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.5±0.12ms        ? ?/sec    1.00     14.5±0.10ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.2±0.00ms        ? ?/sec    1.00      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.00      9.2±0.09ms        ? ?/sec    1.01      9.3±0.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.1±0.07ms        ? ?/sec    1.01     11.2±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.00     36.4±0.28ms        ? ?/sec    1.00     36.4±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     49.9±0.41ms        ? ?/sec    1.00     49.9±0.38ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     47.9±0.28ms        ? ?/sec    1.01     48.2±0.38ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.02      4.3±0.02ms        ? ?/sec    1.00      4.2±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.01    178.1±0.90ms        ? ?/sec    1.00    176.8±0.72ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    233.1±2.45ms        ? ?/sec    1.00    233.5±0.83ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.01    479.4±2.39ms        ? ?/sec    1.00    476.4±2.19ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.02   443.9±12.86ms        ? ?/sec    1.00   435.5±16.08ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     51.0±0.52ms        ? ?/sec    1.01     51.7±0.65ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    153.9±0.61ms        ? ?/sec    1.00    153.3±0.68ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.01    150.4±0.65ms        ? ?/sec    1.00    149.2±0.86ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.01     59.3±0.40ms        ? ?/sec    1.00     58.9±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.01    158.6±1.04ms        ? ?/sec    1.00    157.7±0.94ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.01     93.2±0.44ms        ? ?/sec    1.00     92.5±0.42ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     31.9±0.20ms        ? ?/sec    1.01     32.2±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.01     34.7±0.41ms        ? ?/sec    1.00     34.3±0.29ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     50.4±0.47ms        ? ?/sec    1.00     50.5±0.48ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     37.7±0.37ms        ? ?/sec    1.01     38.0±0.30ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.6±0.07ms        ? ?/sec    1.01     13.7±0.09ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing pushdown-v4 (1851f0b) to af8564f diff
BENCH_NAME=arrow_reader
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader
BENCH_FILTER=
BENCH_BRANCH_NAME=pushdown-v4
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖: Benchmark completed

😎 -- very nice

@zhuqi-lucas
Copy link
Contributor

🤖: Benchmark completed

😎 -- very nice

Great result!

I am curious about the performance compared with no filter pushdown case, because previous try will also improve the performance for this benchmark. But compared to the no filter pushdown case, it has some regression.

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

I am curious about the performance compared with no filter pushdown case, because previous try will also improve the performance for this benchmark. But compared to the no filter pushdown case, it has some regression.

I will try and run this experiment later today

@zhuqi-lucas
Copy link
Contributor

I am curious about the performance compared with no filter pushdown case, because previous try will also improve the performance for this benchmark. But compared to the no filter pushdown case, it has some regression.

I will try and run this experiment later today

Thank you @alamb , if it has no regression, i believe this PR will also resolve the adaptive selection cases, if it has regression, we can further combine the adaptive selection for final optimization.

@XiangpengHao

This comment was marked as resolved.

@alamb
Copy link
Contributor

alamb commented Jul 3, 2025

🤖: Benchmark completed

Details

group                                                                                                      main                                   pushdown-v4
-----                                                                                                      ----                                   -----------
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                           1.06   1356.3±2.84µs        ? ?/sec    1.00   1277.4±2.92µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                          1.02   1352.0±2.48µs        ? ?/sec    1.00   1323.1±3.61µs        ? ?/sec
arrow_array_reader/BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                            1.06   1361.7±3.15µs        ? ?/sec    1.00   1283.6±2.09µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, mandatory, no NULLs                                     1.00    484.4±6.57µs        ? ?/sec    1.06    512.0±4.35µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, half NULLs                                    1.00    662.9±2.03µs        ? ?/sec    1.05    694.0±2.13µs        ? ?/sec
arrow_array_reader/BinaryArray/dictionary encoded, optional, no NULLs                                      1.00    485.8±3.76µs        ? ?/sec    1.05    509.5±4.37µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, mandatory, no NULLs                                          1.09    626.7±3.48µs        ? ?/sec    1.00    577.1±3.17µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, half NULLs                                         1.01    772.8±2.90µs        ? ?/sec    1.00    763.2±2.98µs        ? ?/sec
arrow_array_reader/BinaryArray/plain encoded, optional, no NULLs                                           1.07    632.7±2.73µs        ? ?/sec    1.00    590.5±4.25µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, mandatory, no NULLs                                 1.03    258.8±3.21µs        ? ?/sec    1.00    251.7±2.83µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, half NULLs                                1.17    269.3±0.80µs        ? ?/sec    1.00    230.1±0.60µs        ? ?/sec
arrow_array_reader/BinaryViewArray/dictionary encoded, optional, no NULLs                                  1.00    257.7±2.56µs        ? ?/sec    1.00    258.5±3.28µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs                                      1.00    309.6±1.51µs        ? ?/sec    1.00    311.1±2.30µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, mandatory, no NULLs, short string                        1.00    301.0±0.54µs        ? ?/sec    1.07    321.4±0.61µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, half NULLs                                     1.13    306.2±1.12µs        ? ?/sec    1.00    269.9±1.09µs        ? ?/sec
arrow_array_reader/BinaryViewArray/plain encoded, optional, no NULLs                                       1.00    317.2±1.37µs        ? ?/sec    1.00    318.4±1.88µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs     1.01   1077.6±2.48µs        ? ?/sec    1.00   1066.7±1.91µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, half NULLs    1.05    951.0±2.12µs        ? ?/sec    1.00    902.7±2.82µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/byte_stream_split encoded, optional, no NULLs      1.01   1083.5±2.79µs        ? ?/sec    1.00   1074.1±4.83µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, mandatory, no NULLs                 1.04    448.4±3.42µs        ? ?/sec    1.00    432.8±4.39µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, half NULLs                1.11    630.6±1.87µs        ? ?/sec    1.00    567.9±4.22µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Decimal128Array/plain encoded, optional, no NULLs                  1.04    457.8±4.89µs        ? ?/sec    1.00    438.3±3.40µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, mandatory, no NULLs        1.00    153.1±0.31µs        ? ?/sec    1.05    160.6±0.29µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, half NULLs       1.19    297.8±0.69µs        ? ?/sec    1.00    249.8±0.82µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/byte_stream_split encoded, optional, no NULLs         1.00    158.7±0.36µs        ? ?/sec    1.05    166.4±1.13µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, mandatory, no NULLs                    1.00     77.3±0.22µs        ? ?/sec    1.00     77.2±0.19µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, half NULLs                   1.25    257.7±0.48µs        ? ?/sec    1.00    206.9±0.37µs        ? ?/sec
arrow_array_reader/FIXED_LEN_BYTE_ARRAY/Float16Array/plain encoded, optional, no NULLs                     1.02     83.5±0.22µs        ? ?/sec    1.00     82.0±3.11µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, mandatory, no NULLs                    1.00    686.9±1.54µs        ? ?/sec    1.08    740.3±4.00µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, half NULLs                   1.02    561.3±1.29µs        ? ?/sec    1.00    550.5±1.88µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/byte_stream_split encoded, optional, no NULLs                     1.00    693.1±1.30µs        ? ?/sec    1.08    747.3±2.10µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, mandatory, no NULLs                                1.00     65.1±4.91µs        ? ?/sec    1.07     69.3±4.01µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, half NULLs                               1.19    254.1±3.38µs        ? ?/sec    1.00    214.4±1.60µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(16)/plain encoded, optional, no NULLs                                 1.00     71.5±3.59µs        ? ?/sec    1.07     76.4±4.51µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, mandatory, no NULLs                     1.00     86.3±0.17µs        ? ?/sec    1.09     94.4±0.72µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, half NULLs                    1.26    228.6±0.89µs        ? ?/sec    1.00    181.1±0.37µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/byte_stream_split encoded, optional, no NULLs                      1.00     91.0±0.29µs        ? ?/sec    1.09     99.2±0.27µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, mandatory, no NULLs                                 1.00      9.3±0.11µs        ? ?/sec    1.02      9.5±0.23µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, half NULLs                                1.37    190.3±0.85µs        ? ?/sec    1.00    138.5±0.26µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(2)/plain encoded, optional, no NULLs                                  1.00     14.6±0.24µs        ? ?/sec    1.02     14.9±0.39µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, mandatory, no NULLs                     1.00    170.2±0.42µs        ? ?/sec    1.08    184.4±0.56µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, half NULLs                    1.27    349.1±0.82µs        ? ?/sec    1.00    275.7±0.70µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/byte_stream_split encoded, optional, no NULLs                      1.00    175.8±0.44µs        ? ?/sec    1.08    189.6±0.51µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, mandatory, no NULLs                                 1.00     12.9±0.26µs        ? ?/sec    1.14     14.7±0.42µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, half NULLs                                1.41    267.4±0.67µs        ? ?/sec    1.00    190.2±0.58µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(4)/plain encoded, optional, no NULLs                                  1.00     20.0±0.74µs        ? ?/sec    1.00     20.0±0.36µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, mandatory, no NULLs                     1.00    340.8±0.84µs        ? ?/sec    1.07    365.3±0.82µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, half NULLs                    1.08    376.1±1.45µs        ? ?/sec    1.00    348.3±0.85µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/byte_stream_split encoded, optional, no NULLs                      1.00    347.6±1.68µs        ? ?/sec    1.07    371.8±0.92µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, mandatory, no NULLs                                 1.00     26.0±0.54µs        ? ?/sec    1.17     30.3±1.95µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, half NULLs                                1.22    219.8±0.58µs        ? ?/sec    1.00    179.7±0.58µs        ? ?/sec
arrow_array_reader/FixedLenByteArray(8)/plain encoded, optional, no NULLs                                  1.00     32.6±0.53µs        ? ?/sec    1.09     35.5±1.36µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    120.2±0.20µs        ? ?/sec    1.01    121.8±0.18µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    135.7±0.53µs        ? ?/sec    1.02    138.6±0.32µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    123.1±0.19µs        ? ?/sec    1.02    126.1±0.26µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, mandatory, no NULLs                                1.01    174.1±0.60µs        ? ?/sec    1.00    171.8±0.28µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, half NULLs                               1.00    230.2±0.68µs        ? ?/sec    1.01    232.8±0.70µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/binary packed, optional, no NULLs                                 1.01    179.4±0.43µs        ? ?/sec    1.00    177.0±0.46µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00     77.2±0.20µs        ? ?/sec    1.01     78.0±0.68µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    178.9±0.83µs        ? ?/sec    1.01    181.2±1.04µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.01     82.3±0.31µs        ? ?/sec    1.00     81.8±0.26µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    138.4±0.42µs        ? ?/sec    1.06    147.0±0.36µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    213.4±0.55µs        ? ?/sec    1.03    219.8±0.91µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    143.6±0.28µs        ? ?/sec    1.06    152.8±0.29µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     74.6±0.44µs        ? ?/sec    1.00     74.6±0.30µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, half NULLs                               1.00    177.5±0.71µs        ? ?/sec    1.01    179.7±0.46µs        ? ?/sec
arrow_array_reader/INT32/Decimal128Array/plain encoded, optional, no NULLs                                 1.00     78.4±0.22µs        ? ?/sec    1.01     79.5±0.26µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, mandatory, no NULLs                           1.00    113.8±0.15µs        ? ?/sec    1.01    114.9±0.18µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, half NULLs                          1.00    140.0±0.32µs        ? ?/sec    1.03    144.7±0.64µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed skip, optional, no NULLs                            1.00    116.7±0.13µs        ? ?/sec    1.02    119.6±0.57µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, mandatory, no NULLs                                1.00    171.7±0.63µs        ? ?/sec    1.02    175.7±0.48µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, half NULLs                               1.00    249.4±0.59µs        ? ?/sec    1.02    253.6±0.63µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/binary packed, optional, no NULLs                                 1.00    176.6±0.51µs        ? ?/sec    1.03    181.6±0.73µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, mandatory, no NULLs                    1.00    202.6±0.43µs        ? ?/sec    1.00    203.3±0.29µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, half NULLs                   1.00    263.1±0.57µs        ? ?/sec    1.00    263.6±0.81µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/byte_stream_split encoded, optional, no NULLs                     1.00    209.1±0.51µs        ? ?/sec    1.01    210.2±0.56µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, mandatory, no NULLs                           1.00    145.9±0.34µs        ? ?/sec    1.07    156.7±0.30µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, half NULLs                          1.00    230.6±0.61µs        ? ?/sec    1.03    236.8±0.62µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/dictionary encoded, optional, no NULLs                            1.00    151.3±0.34µs        ? ?/sec    1.06    159.9±0.96µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, mandatory, no NULLs                                1.00     97.6±0.97µs        ? ?/sec    1.11    108.3±0.72µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, half NULLs                               1.00    208.6±1.32µs        ? ?/sec    1.03    214.8±0.91µs        ? ?/sec
arrow_array_reader/INT64/Decimal128Array/plain encoded, optional, no NULLs                                 1.00    107.3±2.25µs        ? ?/sec    1.15    123.3±1.20µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, mandatory, no NULLs                                      1.00     95.6±0.12µs        ? ?/sec    1.04     99.4±0.25µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, half NULLs                                     1.00    113.9±0.18µs        ? ?/sec    1.02    116.2±0.46µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed skip, optional, no NULLs                                       1.00     98.6±0.22µs        ? ?/sec    1.04    102.3±0.33µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, mandatory, no NULLs                                           1.00    130.9±0.37µs        ? ?/sec    1.05    138.0±0.77µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, half NULLs                                          1.00    189.6±0.46µs        ? ?/sec    1.03    194.5±0.29µs        ? ?/sec
arrow_array_reader/Int16Array/binary packed, optional, no NULLs                                            1.00    135.5±0.33µs        ? ?/sec    1.06    143.0±0.58µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     44.4±0.11µs        ? ?/sec    1.01     44.9±0.11µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, half NULLs                              1.00    143.4±0.29µs        ? ?/sec    1.01    144.4±1.68µs        ? ?/sec
arrow_array_reader/Int16Array/byte_stream_split encoded, optional, no NULLs                                1.00     48.6±0.12µs        ? ?/sec    1.01     49.2±0.17µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, mandatory, no NULLs                                      1.00    104.6±0.17µs        ? ?/sec    1.09    114.4±0.27µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, half NULLs                                     1.00    177.8±0.47µs        ? ?/sec    1.03    182.6±2.84µs        ? ?/sec
arrow_array_reader/Int16Array/dictionary encoded, optional, no NULLs                                       1.00    109.4±0.22µs        ? ?/sec    1.09    119.6±3.64µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, mandatory, no NULLs                                           1.00     38.9±0.14µs        ? ?/sec    1.00     38.8±0.08µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, half NULLs                                          1.00    141.4±0.38µs        ? ?/sec    1.00    140.8±1.42µs        ? ?/sec
arrow_array_reader/Int16Array/plain encoded, optional, no NULLs                                            1.01     43.8±0.19µs        ? ?/sec    1.00     43.5±0.22µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, mandatory, no NULLs                                      1.00     94.6±0.20µs        ? ?/sec    1.02     96.1±0.21µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, half NULLs                                     1.00    108.9±0.32µs        ? ?/sec    1.02    110.9±0.92µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed skip, optional, no NULLs                                       1.00     98.2±0.32µs        ? ?/sec    1.01     98.7±0.23µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, mandatory, no NULLs                                           1.00    121.4±0.27µs        ? ?/sec    1.00    121.0±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, half NULLs                                          1.00    174.6±0.69µs        ? ?/sec    1.02    177.9±0.35µs        ? ?/sec
arrow_array_reader/Int32Array/binary packed, optional, no NULLs                                            1.00    125.8±0.44µs        ? ?/sec    1.00    126.0±0.39µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, mandatory, no NULLs                               1.11     26.3±0.21µs        ? ?/sec    1.00     23.7±0.06µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, half NULLs                              1.00    126.0±0.27µs        ? ?/sec    1.01    127.5±0.31µs        ? ?/sec
arrow_array_reader/Int32Array/byte_stream_split encoded, optional, no NULLs                                1.00     30.1±0.25µs        ? ?/sec    1.03     31.1±0.19µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, mandatory, no NULLs                                      1.00     87.2±0.26µs        ? ?/sec    1.11     96.5±0.28µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, half NULLs                                     1.00    157.0±0.39µs        ? ?/sec    1.04    163.6±0.36µs        ? ?/sec
arrow_array_reader/Int32Array/dictionary encoded, optional, no NULLs                                       1.00     91.1±0.36µs        ? ?/sec    1.12    101.7±0.40µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, mandatory, no NULLs                                           1.00     18.2±0.22µs        ? ?/sec    1.01     18.4±0.39µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, half NULLs                                          1.00    122.0±0.34µs        ? ?/sec    1.01    123.1±0.45µs        ? ?/sec
arrow_array_reader/Int32Array/plain encoded, optional, no NULLs                                            1.01     24.9±0.49µs        ? ?/sec    1.00     24.8±0.43µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, mandatory, no NULLs                                      1.00     87.0±0.43µs        ? ?/sec    1.02     88.4±0.67µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, half NULLs                                     1.00    112.3±0.35µs        ? ?/sec    1.00    111.9±0.36µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed skip, optional, no NULLs                                       1.00     89.3±0.27µs        ? ?/sec    1.01     90.6±0.31µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, mandatory, no NULLs                                           1.00    117.9±0.65µs        ? ?/sec    1.04    122.6±0.58µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, half NULLs                                          1.00    186.8±0.63µs        ? ?/sec    1.03    193.3±0.82µs        ? ?/sec
arrow_array_reader/Int64Array/binary packed, optional, no NULLs                                            1.00    120.7±0.60µs        ? ?/sec    1.05    127.3±3.66µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, mandatory, no NULLs                               1.01    151.7±0.32µs        ? ?/sec    1.00    149.8±0.46µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, half NULLs                              1.01    209.7±0.70µs        ? ?/sec    1.00    207.1±1.72µs        ? ?/sec
arrow_array_reader/Int64Array/byte_stream_split encoded, optional, no NULLs                                1.01    156.7±0.39µs        ? ?/sec    1.00    154.5±0.26µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, mandatory, no NULLs                                      1.00     93.1±0.46µs        ? ?/sec    1.09    101.7±0.58µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, half NULLs                                     1.02    182.8±0.54µs        ? ?/sec    1.00    179.1±0.71µs        ? ?/sec
arrow_array_reader/Int64Array/dictionary encoded, optional, no NULLs                                       1.00     97.7±0.52µs        ? ?/sec    1.10    107.5±2.91µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, mandatory, no NULLs                                           1.00     42.5±0.65µs        ? ?/sec    1.12     47.7±1.88µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, half NULLs                                          1.00    150.0±0.71µs        ? ?/sec    1.00    150.5±1.19µs        ? ?/sec
arrow_array_reader/Int64Array/plain encoded, optional, no NULLs                                            1.00     47.0±0.68µs        ? ?/sec    1.14     53.7±1.88µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, mandatory, no NULLs                                       1.00     92.3±0.17µs        ? ?/sec    1.01     93.3±0.21µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, half NULLs                                      1.00    110.0±0.61µs        ? ?/sec    1.01    111.2±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed skip, optional, no NULLs                                        1.00     95.1±0.17µs        ? ?/sec    1.01     96.3±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, mandatory, no NULLs                                            1.01    123.0±0.28µs        ? ?/sec    1.00    122.4±0.61µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, half NULLs                                           1.00    182.0±1.07µs        ? ?/sec    1.00    182.3±0.35µs        ? ?/sec
arrow_array_reader/Int8Array/binary packed, optional, no NULLs                                             1.00    127.3±0.44µs        ? ?/sec    1.00    126.9±1.12µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, mandatory, no NULLs                                1.00     36.9±0.12µs        ? ?/sec    1.00     37.0±0.07µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, half NULLs                               1.01    136.8±0.48µs        ? ?/sec    1.00    135.7±0.34µs        ? ?/sec
arrow_array_reader/Int8Array/byte_stream_split encoded, optional, no NULLs                                 1.00     41.0±0.32µs        ? ?/sec    1.01     41.4±0.10µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, mandatory, no NULLs                                       1.00     96.6±0.20µs        ? ?/sec    1.11    106.9±0.24µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, half NULLs                                      1.00    170.4±0.44µs        ? ?/sec    1.03    175.1±1.72µs        ? ?/sec
arrow_array_reader/Int8Array/dictionary encoded, optional, no NULLs                                        1.00    101.3±0.25µs        ? ?/sec    1.10    111.6±0.80µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, mandatory, no NULLs                                            1.00     31.2±0.12µs        ? ?/sec    1.00     31.1±0.07µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, half NULLs                                           1.00    133.7±0.58µs        ? ?/sec    1.00    133.3±0.23µs        ? ?/sec
arrow_array_reader/Int8Array/plain encoded, optional, no NULLs                                             1.00     35.5±0.20µs        ? ?/sec    1.01     36.0±0.11µs        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings half NULLs                                     1.01      7.2±0.04ms        ? ?/sec    1.00      7.1±0.04ms        ? ?/sec
arrow_array_reader/ListArray/plain encoded optional strings no NULLs                                       1.01     13.3±0.11ms        ? ?/sec    1.00     13.2±0.16ms        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, mandatory, no NULLs                                     1.00    495.7±3.68µs        ? ?/sec    1.04    513.4±2.64µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, half NULLs                                    1.00    665.7±5.16µs        ? ?/sec    1.04    694.8±1.99µs        ? ?/sec
arrow_array_reader/StringArray/dictionary encoded, optional, no NULLs                                      1.00    498.7±3.42µs        ? ?/sec    1.02    510.0±3.08µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, mandatory, no NULLs                                          1.20    726.9±3.72µs        ? ?/sec    1.00    607.3±3.10µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, half NULLs                                         1.03    817.4±3.99µs        ? ?/sec    1.00    796.7±7.55µs        ? ?/sec
arrow_array_reader/StringArray/plain encoded, optional, no NULLs                                           1.19    732.6±2.67µs        ? ?/sec    1.00    615.7±3.29µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, mandatory, no NULLs                                1.01    322.0±1.12µs        ? ?/sec    1.00    320.3±1.68µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, half NULLs                               1.00    401.2±1.26µs        ? ?/sec    1.08    432.0±2.30µs        ? ?/sec
arrow_array_reader/StringDictionary/dictionary encoded, optional, no NULLs                                 1.01    328.1±1.32µs        ? ?/sec    1.00    326.5±1.63µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, mandatory, no NULLs                                 1.02    259.4±2.77µs        ? ?/sec    1.00    255.2±2.32µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, half NULLs                                1.15    277.3±0.65µs        ? ?/sec    1.00    240.4±0.67µs        ? ?/sec
arrow_array_reader/StringViewArray/dictionary encoded, optional, no NULLs                                  1.00    265.7±2.52µs        ? ?/sec    1.01    269.6±2.35µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, mandatory, no NULLs                                      1.03    383.8±1.92µs        ? ?/sec    1.00    372.4±1.34µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, half NULLs                                     1.13    339.4±1.33µs        ? ?/sec    1.00    301.3±1.63µs        ? ?/sec
arrow_array_reader/StringViewArray/plain encoded, optional, no NULLs                                       1.03    395.0±6.05µs        ? ?/sec    1.00    385.3±2.38µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, mandatory, no NULLs                                     1.00    102.2±0.19µs        ? ?/sec    1.00    101.8±0.23µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, half NULLs                                    1.00    118.1±0.29µs        ? ?/sec    1.00    117.6±1.43µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed skip, optional, no NULLs                                      1.01    105.0±0.34µs        ? ?/sec    1.00    104.1±0.24µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, mandatory, no NULLs                                          1.01    139.8±0.27µs        ? ?/sec    1.00    139.0±0.19µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, half NULLs                                         1.00    195.4±0.41µs        ? ?/sec    1.00    194.8±0.63µs        ? ?/sec
arrow_array_reader/UInt16Array/binary packed, optional, no NULLs                                           1.01    144.3±0.30µs        ? ?/sec    1.00    143.5±0.95µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, mandatory, no NULLs                              1.04     44.6±0.12µs        ? ?/sec    1.00     43.0±0.10µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, half NULLs                             1.01    144.2±1.16µs        ? ?/sec    1.00    143.2±1.37µs        ? ?/sec
arrow_array_reader/UInt16Array/byte_stream_split encoded, optional, no NULLs                               1.03     49.0±0.13µs        ? ?/sec    1.00     47.6±0.15µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, mandatory, no NULLs                                     1.00    104.5±0.28µs        ? ?/sec    1.10    114.6±0.48µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, half NULLs                                    1.00    178.3±1.76µs        ? ?/sec    1.02    182.6±1.21µs        ? ?/sec
arrow_array_reader/UInt16Array/dictionary encoded, optional, no NULLs                                      1.00    109.3±0.70µs        ? ?/sec    1.09    119.2±0.46µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, mandatory, no NULLs                                          1.01     39.2±0.31µs        ? ?/sec    1.00     38.9±0.09µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, half NULLs                                         1.01    142.2±3.14µs        ? ?/sec    1.00    140.9±0.58µs        ? ?/sec
arrow_array_reader/UInt16Array/plain encoded, optional, no NULLs                                           1.00     43.1±0.09µs        ? ?/sec    1.01     43.7±0.13µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, mandatory, no NULLs                                     1.00     94.5±0.13µs        ? ?/sec    1.02     96.4±1.15µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, half NULLs                                    1.00    109.5±0.24µs        ? ?/sec    1.01    110.2±0.31µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed skip, optional, no NULLs                                      1.00     97.3±0.22µs        ? ?/sec    1.01     98.7±0.21µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, mandatory, no NULLs                                          1.02    123.7±0.55µs        ? ?/sec    1.00    121.2±0.31µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, half NULLs                                         1.00    177.5±0.37µs        ? ?/sec    1.00    177.0±0.41µs        ? ?/sec
arrow_array_reader/UInt32Array/binary packed, optional, no NULLs                                           1.02    128.0±0.69µs        ? ?/sec    1.00    125.8±0.41µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, mandatory, no NULLs                              1.00     27.1±0.35µs        ? ?/sec    1.00     27.0±0.22µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, half NULLs                             1.01    128.1±1.29µs        ? ?/sec    1.00    126.5±0.36µs        ? ?/sec
arrow_array_reader/UInt32Array/byte_stream_split encoded, optional, no NULLs                               1.00     31.5±0.33µs        ? ?/sec    1.00     31.5±0.44µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, mandatory, no NULLs                                     1.00     87.1±0.42µs        ? ?/sec    1.11     96.8±0.35µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, half NULLs                                    1.00    161.0±0.23µs        ? ?/sec    1.02    164.3±0.48µs        ? ?/sec
arrow_array_reader/UInt32Array/dictionary encoded, optional, no NULLs                                      1.00     91.8±0.28µs        ? ?/sec    1.10    101.3±0.70µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, mandatory, no NULLs                                          1.00     21.5±0.40µs        ? ?/sec    1.02     21.8±0.57µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, half NULLs                                         1.00    123.9±0.57µs        ? ?/sec    1.00    124.2±0.41µs        ? ?/sec
arrow_array_reader/UInt32Array/plain encoded, optional, no NULLs                                           1.00     26.3±0.38µs        ? ?/sec    1.01     26.6±0.37µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, mandatory, no NULLs                                     1.00     87.1±0.25µs        ? ?/sec    1.03     89.4±0.36µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, half NULLs                                    1.00    112.4±0.44µs        ? ?/sec    1.01    113.0±1.44µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed skip, optional, no NULLs                                      1.00     89.3±0.27µs        ? ?/sec    1.03     92.2±0.34µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, mandatory, no NULLs                                          1.00    118.0±0.64µs        ? ?/sec    1.03    121.6±0.59µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, half NULLs                                         1.00    186.2±0.50µs        ? ?/sec    1.05    195.5±0.49µs        ? ?/sec
arrow_array_reader/UInt64Array/binary packed, optional, no NULLs                                           1.00    120.6±0.44µs        ? ?/sec    1.04    125.3±0.44µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, mandatory, no NULLs                              1.01    151.9±0.56µs        ? ?/sec    1.00    150.7±0.38µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, half NULLs                             1.01    207.6±1.85µs        ? ?/sec    1.00    205.1±0.74µs        ? ?/sec
arrow_array_reader/UInt64Array/byte_stream_split encoded, optional, no NULLs                               1.01    156.8±0.44µs        ? ?/sec    1.00    155.8±1.35µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, mandatory, no NULLs                                     1.00     93.6±0.75µs        ? ?/sec    1.09    102.2±0.67µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, half NULLs                                    1.02    182.2±0.36µs        ? ?/sec    1.00    178.6±0.45µs        ? ?/sec
arrow_array_reader/UInt64Array/dictionary encoded, optional, no NULLs                                      1.00     97.3±0.73µs        ? ?/sec    1.11    107.6±0.74µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, mandatory, no NULLs                                          1.00     43.9±0.70µs        ? ?/sec    1.05     46.3±2.01µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, half NULLs                                         1.01    150.1±1.21µs        ? ?/sec    1.00    149.2±0.87µs        ? ?/sec
arrow_array_reader/UInt64Array/plain encoded, optional, no NULLs                                           1.00     50.0±1.04µs        ? ?/sec    1.06     53.0±2.24µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, mandatory, no NULLs                                      1.01    100.9±0.19µs        ? ?/sec    1.00    100.0±0.22µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, half NULLs                                     1.00    114.5±0.43µs        ? ?/sec    1.00    114.9±0.29µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed skip, optional, no NULLs                                       1.01    103.5±0.33µs        ? ?/sec    1.00    102.4±0.46µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, mandatory, no NULLs                                           1.02    132.4±0.16µs        ? ?/sec    1.00    130.2±0.28µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, half NULLs                                          1.00    186.6±0.43µs        ? ?/sec    1.00    187.0±0.46µs        ? ?/sec
arrow_array_reader/UInt8Array/binary packed, optional, no NULLs                                            1.01    137.0±0.70µs        ? ?/sec    1.00    135.1±0.99µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, mandatory, no NULLs                               1.00     35.1±0.08µs        ? ?/sec    1.03     36.2±0.16µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, half NULLs                              1.01    136.9±0.39µs        ? ?/sec    1.00    136.2±1.12µs        ? ?/sec
arrow_array_reader/UInt8Array/byte_stream_split encoded, optional, no NULLs                                1.00     39.7±0.12µs        ? ?/sec    1.04     41.2±0.20µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, mandatory, no NULLs                                      1.00     97.1±0.52µs        ? ?/sec    1.10    106.7±0.19µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, half NULLs                                     1.00    170.1±1.99µs        ? ?/sec    1.03    174.9±0.37µs        ? ?/sec
arrow_array_reader/UInt8Array/dictionary encoded, optional, no NULLs                                       1.00    101.6±0.18µs        ? ?/sec    1.10    111.5±0.34µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, mandatory, no NULLs                                           1.00     30.5±0.27µs        ? ?/sec    1.02     31.0±0.16µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, half NULLs                                          1.01    133.3±0.83µs        ? ?/sec    1.00    132.6±0.40µs        ? ?/sec
arrow_array_reader/UInt8Array/plain encoded, optional, no NULLs                                            1.00     35.1±0.25µs        ? ?/sec    1.01     35.6±0.37µs        ? ?/sec

@XiangpengHao

This comment was marked as resolved.

@XiangpengHao
Copy link
Contributor Author

So one thing I didn't understand after reading this PR in detail was how the relative row positions are updated after applying a filter.

Thank you for reviewing this @alamb !

This is place we do merge back filter selection to original selection: https://github.com/XiangpengHao/arrow-rs/blob/5537bcb0870ba21549e72b58b65237ba823eec50/parquet/src/arrow/arrow_reader/read_plan.rs#L119

i.e., this pr does not change how we represent selections, we still use the existing implementation, the only difference is that we added a transparent cache layer, rest of the code should all be the same.

@alamb
Copy link
Contributor

alamb commented Jul 16, 2025

I found another bug in DataFusion testing here apache/datafusion#16711 (comment)

@alamb
Copy link
Contributor

alamb commented Jul 16, 2025

I am now working on additional review / proposed improvements to this PR -- basically to structure the caching more into the PlanBuilder and make it easier to test

@alamb
Copy link
Contributor

alamb commented Jul 16, 2025

Here is a proposal:

(I think the CI is having issues due to https://www.githubstatus.com/incidents/k20s3qvr28zw)

@alamb
Copy link
Contributor

alamb commented Jul 16, 2025

🤖 ./gh_compare_arrow.sh Benchmark Script Running
Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubuntu SMP Wed May 28 02:40:52 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing pushdown-v4 (b835163) to c40830e diff
BENCH_NAME=arrow_reader_clickbench
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench
BENCH_FILTER=
BENCH_BRANCH_NAME=pushdown-v4
Results will be posted here when complete

@alamb
Copy link
Contributor

alamb commented Jul 16, 2025

🤖: Benchmark completed

Details

group                                main                                   pushdown-v4
-----                                ----                                   -----------
arrow_reader_clickbench/async/Q1     1.02      2.4±0.01ms        ? ?/sec    1.00      2.3±0.01ms        ? ?/sec
arrow_reader_clickbench/async/Q10    1.00     11.3±0.13ms        ? ?/sec    1.07     12.0±0.24ms        ? ?/sec
arrow_reader_clickbench/async/Q11    1.00     13.3±0.20ms        ? ?/sec    1.03     13.7±0.20ms        ? ?/sec
arrow_reader_clickbench/async/Q12    1.39     35.6±0.32ms        ? ?/sec    1.00     25.6±0.31ms        ? ?/sec
arrow_reader_clickbench/async/Q13    1.26     49.7±0.47ms        ? ?/sec    1.00     39.6±0.26ms        ? ?/sec
arrow_reader_clickbench/async/Q14    1.27     47.0±0.30ms        ? ?/sec    1.00     36.9±0.27ms        ? ?/sec
arrow_reader_clickbench/async/Q19    1.00      5.2±0.07ms        ? ?/sec    1.11      5.8±0.15ms        ? ?/sec
arrow_reader_clickbench/async/Q20    1.33    161.3±1.22ms        ? ?/sec    1.00    121.6±0.61ms        ? ?/sec
arrow_reader_clickbench/async/Q21    1.29    207.6±1.10ms        ? ?/sec    1.00    160.3±1.21ms        ? ?/sec
arrow_reader_clickbench/async/Q22    1.00    405.6±2.49ms        ? ?/sec    1.00    406.4±5.58ms        ? ?/sec
arrow_reader_clickbench/async/Q23    1.18    496.8±5.89ms        ? ?/sec    1.00   422.8±12.00ms        ? ?/sec
arrow_reader_clickbench/async/Q24    1.24     55.4±0.66ms        ? ?/sec    1.00     44.6±0.61ms        ? ?/sec
arrow_reader_clickbench/async/Q27    1.53    165.6±1.38ms        ? ?/sec    1.00    108.0±0.55ms        ? ?/sec
arrow_reader_clickbench/async/Q28    1.45    162.6±1.54ms        ? ?/sec    1.00    112.5±0.85ms        ? ?/sec
arrow_reader_clickbench/async/Q30    1.00     62.3±0.73ms        ? ?/sec    1.00     62.7±0.50ms        ? ?/sec
arrow_reader_clickbench/async/Q36    1.32    169.0±1.77ms        ? ?/sec    1.00    128.1±0.56ms        ? ?/sec
arrow_reader_clickbench/async/Q37    1.00    100.5±0.54ms        ? ?/sec    1.00    100.8±0.75ms        ? ?/sec
arrow_reader_clickbench/async/Q38    1.00     39.8±0.28ms        ? ?/sec    1.01     40.2±0.23ms        ? ?/sec
arrow_reader_clickbench/async/Q39    1.01     50.0±0.43ms        ? ?/sec    1.00     49.5±0.34ms        ? ?/sec
arrow_reader_clickbench/async/Q40    1.03     53.8±0.48ms        ? ?/sec    1.00     52.3±0.44ms        ? ?/sec
arrow_reader_clickbench/async/Q41    1.00     41.2±0.40ms        ? ?/sec    1.01     41.5±0.33ms        ? ?/sec
arrow_reader_clickbench/async/Q42    1.00     14.4±0.14ms        ? ?/sec    1.03     14.8±0.17ms        ? ?/sec
arrow_reader_clickbench/sync/Q1      1.00      2.2±0.02ms        ? ?/sec    1.01      2.2±0.01ms        ? ?/sec
arrow_reader_clickbench/sync/Q10     1.03      9.8±0.06ms        ? ?/sec    1.00      9.5±0.07ms        ? ?/sec
arrow_reader_clickbench/sync/Q11     1.00     11.5±0.07ms        ? ?/sec    1.00     11.5±0.11ms        ? ?/sec
arrow_reader_clickbench/sync/Q12     1.02     37.9±0.32ms        ? ?/sec    1.00     37.2±0.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q13     1.00     50.8±0.45ms        ? ?/sec    1.01     51.2±0.52ms        ? ?/sec
arrow_reader_clickbench/sync/Q14     1.00     48.6±0.31ms        ? ?/sec    1.02     49.3±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q19     1.03      4.4±0.03ms        ? ?/sec    1.00      4.3±0.02ms        ? ?/sec
arrow_reader_clickbench/sync/Q20     1.00    176.7±1.29ms        ? ?/sec    1.01    178.1±1.35ms        ? ?/sec
arrow_reader_clickbench/sync/Q21     1.00    234.8±1.84ms        ? ?/sec    1.01    236.9±1.80ms        ? ?/sec
arrow_reader_clickbench/sync/Q22     1.00    479.6±4.95ms        ? ?/sec    1.01    482.4±3.57ms        ? ?/sec
arrow_reader_clickbench/sync/Q23     1.04   443.1±17.45ms        ? ?/sec    1.00    426.0±5.14ms        ? ?/sec
arrow_reader_clickbench/sync/Q24     1.00     53.0±0.74ms        ? ?/sec    1.01     53.3±0.96ms        ? ?/sec
arrow_reader_clickbench/sync/Q27     1.00    154.6±1.23ms        ? ?/sec    1.00    154.2±0.95ms        ? ?/sec
arrow_reader_clickbench/sync/Q28     1.00    153.2±1.41ms        ? ?/sec    1.01    154.5±1.53ms        ? ?/sec
arrow_reader_clickbench/sync/Q30     1.00     59.7±0.41ms        ? ?/sec    1.01     60.1±0.50ms        ? ?/sec
arrow_reader_clickbench/sync/Q36     1.00    158.8±1.01ms        ? ?/sec    1.01    160.1±1.63ms        ? ?/sec
arrow_reader_clickbench/sync/Q37     1.01     94.3±0.74ms        ? ?/sec    1.00     93.8±0.53ms        ? ?/sec
arrow_reader_clickbench/sync/Q38     1.00     32.0±0.21ms        ? ?/sec    1.01     32.3±0.22ms        ? ?/sec
arrow_reader_clickbench/sync/Q39     1.00     34.9±0.48ms        ? ?/sec    1.01     35.3±0.45ms        ? ?/sec
arrow_reader_clickbench/sync/Q40     1.00     50.6±0.50ms        ? ?/sec    1.01     51.0±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q41     1.00     38.1±0.38ms        ? ?/sec    1.01     38.7±0.39ms        ? ?/sec
arrow_reader_clickbench/sync/Q42     1.00     13.7±0.13ms        ? ?/sec    1.01     13.8±0.12ms        ? ?/sec

@alamb
Copy link
Contributor

alamb commented Jul 16, 2025

@alamb
Copy link
Contributor

alamb commented Jul 16, 2025

Summary so far (I now need to go work on some other things for the rest of the day):

I made two proposed changes

My plan for tomorrow will be to try and write some tests:

  1. Reproduce the bug / error in an arrow-rs only test
  2. Write some sort of integration test that shows the cache working (in preparation for wiring in the memory limit)

@XiangpengHao
Copy link
Contributor Author

Summary so far (I now need to go work on some other things for the rest of the day):

I made two proposed changes

My plan for tomorrow will be to try and write some tests:

  1. Reproduce the bug / error in an arrow-rs only test
  2. Write some sort of integration test that shows the cache working (in preparation for wiring in the memory limit)

Thank you for the review @alamb , I plan to take a look on this in the next few days, and also think about further optimizations.

Maybe it's just me, but I can't reproduce some of the regressions reported in datafusion integrations, I'll get a "cloud" machine and try again.

@alamb
Copy link
Contributor

alamb commented Jul 17, 2025

Maybe it's just me, but I can't reproduce some of the regressions reported in datafusion integrations, I'll get a "cloud" machine and try again.

If we can't reproduce them I think we should just ignore it

@XiangpengHao
Copy link
Contributor Author

Summary for new updates:

  1. incorporated the changes from @alamb
  2. added a test case to reproduce error from POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) datafusion#16711 (comment)
  3. fixed the above bug
  4. added a slightly more accurate memory accounting for string view arrays

@alamb
Copy link
Contributor

alamb commented Jul 19, 2025

Thank you -- I will get back to this tomorrow or Monday

@alamb
Copy link
Contributor

alamb commented Jul 21, 2025

I am beginning to look into this -- my planned contribution is to

  1. Make a setting for max cache size (which we will need as a escape valve to turn this off)
  2. Tests for cache memory size

@alamb
Copy link
Contributor

alamb commented Jul 21, 2025

I am beginning to look into this -- my planned contribution is to

  1. Make a setting for max cache size (which we will need as a escape valve to turn this off)
  2. Tests for cache memory size

I started writing some tests but it got somewhat more complicated than I expected. Here is the WIP PR

Once that is in place then I hope to use the same pattern to verify the cache operations. I will continue tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet Changes to the parquet crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants