Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Nov 19, 2025

Which issue does this PR close?

Rationale for this change

Get latest and greatest code from arrow

What changes are included in this PR?

  1. Update to Arrow 57.1.0
  2. Update for API changes (comments inline)

Are these changes tested?

Yes, by CI

Are there any user-facing changes?

No

| alltypes_plain.parquet | 1851 | 6957 | 2 | page_index=false |
| alltypes_tiny_pages.parquet | 454233 | 267014 | 2 | page_index=true |
| lz4_raw_compressed_larger.parquet | 380836 | 996 | 2 | page_index=false |
| alltypes_plain.parquet | 1851 | 8882 | 2 | page_index=false |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the metadata didn't actually get bigger, we just actually included the encryption information (better reporting)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I looked into it more and I think the size growth is a bug. See

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the size is correct. As @etseidl says "the truth hurts"

@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/upgrade_arrow_57.1.0 (840487e) to 6d9ab45 diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb alamb mentioned this pull request Nov 20, 2025
13 tasks
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Nov 20, 2025
@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2025

🤖: Benchmark completed

Details

Comparing HEAD and alamb_upgrade_arrow_57.1.0
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_upgrade_arrow_57.1 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  2669.04 ms │               2708.77 ms │    no change │
│ QQuery 1     │  1235.58 ms │               1311.33 ms │ 1.06x slower │
│ QQuery 2     │  2388.97 ms │               2475.47 ms │    no change │
│ QQuery 3     │  1206.27 ms │               1200.33 ms │    no change │
│ QQuery 4     │  2326.86 ms │               2244.02 ms │    no change │
│ QQuery 5     │ 28556.87 ms │              28558.86 ms │    no change │
│ QQuery 6     │  4095.23 ms │               3958.01 ms │    no change │
│ QQuery 7     │  3903.82 ms │               3868.19 ms │    no change │
└──────────────┴─────────────┴──────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 46382.65ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 46324.97ms │
│ Average Time (HEAD)                     │  5797.83ms │
│ Average Time (alamb_upgrade_arrow_57.1) │  5790.62ms │
│ Queries Faster                          │          0 │
│ Queries Slower                          │          1 │
│ Queries with No Change                  │          7 │
│ Queries with Failure                    │          0 │
└─────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_upgrade_arrow_57.1 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.11 ms │                  2.51 ms │  1.19x slower │
│ QQuery 1     │    49.63 ms │                 49.65 ms │     no change │
│ QQuery 2     │   137.94 ms │                134.20 ms │     no change │
│ QQuery 3     │   163.23 ms │                154.87 ms │ +1.05x faster │
│ QQuery 4     │  1087.29 ms │               1111.76 ms │     no change │
│ QQuery 5     │  1490.99 ms │               1535.73 ms │     no change │
│ QQuery 6     │     2.22 ms │                  2.19 ms │     no change │
│ QQuery 7     │    54.53 ms │                 54.27 ms │     no change │
│ QQuery 8     │  1489.20 ms │               1499.13 ms │     no change │
│ QQuery 9     │  1864.60 ms │               1878.47 ms │     no change │
│ QQuery 10    │   375.25 ms │                386.99 ms │     no change │
│ QQuery 11    │   428.15 ms │                440.85 ms │     no change │
│ QQuery 12    │  1369.91 ms │               1379.43 ms │     no change │
│ QQuery 13    │  2122.32 ms │               2132.63 ms │     no change │
│ QQuery 14    │  1291.97 ms │               1313.81 ms │     no change │
│ QQuery 15    │  1261.86 ms │               1267.39 ms │     no change │
│ QQuery 16    │  2719.83 ms │               2737.71 ms │     no change │
│ QQuery 17    │  2710.92 ms │               2742.06 ms │     no change │
│ QQuery 18    │  5919.25 ms │               5077.25 ms │ +1.17x faster │
│ QQuery 19    │   126.87 ms │                120.96 ms │     no change │
│ QQuery 20    │  2104.85 ms │               1933.69 ms │ +1.09x faster │
│ QQuery 21    │  2406.78 ms │               2211.99 ms │ +1.09x faster │
│ QQuery 22    │  4076.54 ms │               3818.94 ms │ +1.07x faster │
│ QQuery 23    │ 12929.23 ms │              12607.37 ms │     no change │
│ QQuery 24    │   211.40 ms │                207.20 ms │     no change │
│ QQuery 25    │   484.28 ms │                477.07 ms │     no change │
│ QQuery 26    │   222.10 ms │                205.52 ms │ +1.08x faster │
│ QQuery 27    │  2839.44 ms │               2746.85 ms │     no change │
│ QQuery 28    │ 23650.44 ms │              23486.62 ms │     no change │
│ QQuery 29    │   970.57 ms │                986.64 ms │     no change │
│ QQuery 30    │  1358.29 ms │               1355.65 ms │     no change │
│ QQuery 31    │  1399.94 ms │               1375.33 ms │     no change │
│ QQuery 32    │  5469.96 ms │               4955.15 ms │ +1.10x faster │
│ QQuery 33    │  6330.13 ms │               5881.95 ms │ +1.08x faster │
│ QQuery 34    │  6616.67 ms │               6409.89 ms │     no change │
│ QQuery 35    │  2074.80 ms │               2052.74 ms │     no change │
│ QQuery 36    │   119.25 ms │                116.62 ms │     no change │
│ QQuery 37    │    51.77 ms │                 51.71 ms │     no change │
│ QQuery 38    │   119.58 ms │                115.50 ms │     no change │
│ QQuery 39    │   199.66 ms │                189.24 ms │ +1.06x faster │
│ QQuery 40    │    44.56 ms │                 42.01 ms │ +1.06x faster │
│ QQuery 41    │    38.38 ms │                 38.30 ms │     no change │
│ QQuery 42    │    31.94 ms │                 31.93 ms │     no change │
└──────────────┴─────────────┴──────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 98418.61ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 95319.78ms │
│ Average Time (HEAD)                     │  2288.80ms │
│ Average Time (alamb_upgrade_arrow_57.1) │  2216.74ms │
│ Queries Faster                          │         10 │
│ Queries Slower                          │          1 │
│ Queries with No Change                  │         32 │
│ Queries with Failure                    │          0 │
└─────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ alamb_upgrade_arrow_57.1 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 139.21 ms │                130.93 ms │ +1.06x faster │
│ QQuery 2     │  29.26 ms │                 29.13 ms │     no change │
│ QQuery 3     │  38.71 ms │                 34.79 ms │ +1.11x faster │
│ QQuery 4     │  29.41 ms │                 28.65 ms │     no change │
│ QQuery 5     │  87.42 ms │                 88.35 ms │     no change │
│ QQuery 6     │  19.55 ms │                 20.18 ms │     no change │
│ QQuery 7     │ 228.31 ms │                227.00 ms │     no change │
│ QQuery 8     │  34.20 ms │                 32.54 ms │     no change │
│ QQuery 9     │  97.79 ms │                110.99 ms │  1.14x slower │
│ QQuery 10    │  64.27 ms │                 63.16 ms │     no change │
│ QQuery 11    │  17.23 ms │                 17.26 ms │     no change │
│ QQuery 12    │  52.89 ms │                 51.90 ms │     no change │
│ QQuery 13    │  46.75 ms │                 46.20 ms │     no change │
│ QQuery 14    │  14.19 ms │                 13.74 ms │     no change │
│ QQuery 15    │  25.06 ms │                 24.65 ms │     no change │
│ QQuery 16    │  25.08 ms │                 25.22 ms │     no change │
│ QQuery 17    │ 147.82 ms │                153.69 ms │     no change │
│ QQuery 18    │ 307.83 ms │                284.87 ms │ +1.08x faster │
│ QQuery 19    │  37.51 ms │                 38.95 ms │     no change │
│ QQuery 20    │  49.58 ms │                 49.73 ms │     no change │
│ QQuery 21    │ 334.74 ms │                321.67 ms │     no change │
│ QQuery 22    │  20.67 ms │                 20.62 ms │     no change │
└──────────────┴───────────┴──────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 1847.47ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 1814.23ms │
│ Average Time (HEAD)                     │   83.98ms │
│ Average Time (alamb_upgrade_arrow_57.1) │   82.46ms │
│ Queries Faster                          │         3 │
│ Queries Slower                          │         1 │
│ Queries with No Change                  │        18 │
│ Queries with Failure                    │         0 │
└─────────────────────────────────────────┴───────────┘

@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2025

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing alamb/upgrade_arrow_57.1.0 (fafb102) to 7d8b860 diff using: clickbench_pushdown
Results will be posted here when complete

query TTT
select arrow_typeof(column1), arrow_typeof(column2), arrow_typeof(column3) from arrays;
----
List(nullable List(nullable Int64)) List(nullable Float64) List(nullable Utf8)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously the DataType parsing code did not handle this syntax (it only supported List(Float64)). We have now made the display and parsing consistent, see apache/arrow-rs#8649 (comment) for background and details

| alltypes_plain.parquet | 1851 | 6957 | 2 | page_index=false |
| alltypes_tiny_pages.parquet | 454233 | 267014 | 2 | page_index=true |
| lz4_raw_compressed_larger.parquet | 380836 | 996 | 2 | page_index=false |
| alltypes_plain.parquet | 1851 | 8882 | 2 | page_index=false |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I looked into it more and I think the size growth is a bug. See

@alamb
Copy link
Contributor Author

alamb commented Nov 20, 2025

🤖: Benchmark completed

Details

Comparing HEAD and alamb_upgrade_arrow_57.1.0
--------------------
Benchmark clickbench_pushdown.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_upgrade_arrow_57.1 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.19 ms │                  2.67 ms │  1.22x slower │
│ QQuery 1     │    53.32 ms │                 51.49 ms │     no change │
│ QQuery 2     │   141.71 ms │                134.25 ms │ +1.06x faster │
│ QQuery 3     │   167.28 ms │                156.44 ms │ +1.07x faster │
│ QQuery 4     │  1084.35 ms │               1106.65 ms │     no change │
│ QQuery 5     │  1556.95 ms │               1490.85 ms │     no change │
│ QQuery 6     │     2.16 ms │                  2.32 ms │  1.07x slower │
│ QQuery 7     │    74.61 ms │                 66.99 ms │ +1.11x faster │
│ QQuery 8     │  1486.06 ms │               1416.15 ms │     no change │
│ QQuery 9     │  1893.82 ms │               1877.58 ms │     no change │
│ QQuery 10    │   480.16 ms │                496.91 ms │     no change │
│ QQuery 11    │   557.20 ms │                549.42 ms │     no change │
│ QQuery 12    │  1608.14 ms │               1537.86 ms │     no change │
│ QQuery 13    │  2579.34 ms │               2322.88 ms │ +1.11x faster │
│ QQuery 14    │  1693.59 ms │               1457.10 ms │ +1.16x faster │
│ QQuery 15    │  1298.92 ms │               1255.58 ms │     no change │
│ QQuery 16    │  2729.63 ms │               2662.03 ms │     no change │
│ QQuery 17    │  2739.17 ms │               2653.09 ms │     no change │
│ QQuery 18    │  5301.07 ms │               4998.57 ms │ +1.06x faster │
│ QQuery 19    │   149.56 ms │                139.48 ms │ +1.07x faster │
│ QQuery 20    │  2047.96 ms │               1894.37 ms │ +1.08x faster │
│ QQuery 21    │  2453.34 ms │               2307.63 ms │ +1.06x faster │
│ QQuery 22    │  4124.78 ms │               3992.67 ms │     no change │
│ QQuery 23    │  1144.82 ms │               1083.87 ms │ +1.06x faster │
│ QQuery 24    │   258.15 ms │                248.11 ms │     no change │
│ QQuery 25    │   676.36 ms │                648.55 ms │     no change │
│ QQuery 26    │   358.14 ms │                343.48 ms │     no change │
│ QQuery 27    │  3125.24 ms │               3006.62 ms │     no change │
│ QQuery 28    │ 23975.91 ms │              23762.45 ms │     no change │
│ QQuery 29    │   961.14 ms │                989.91 ms │     no change │
│ QQuery 30    │  2163.07 ms │               1380.73 ms │ +1.57x faster │
│ QQuery 31    │  2089.33 ms │               1351.52 ms │ +1.55x faster │
│ QQuery 32    │  4853.46 ms │               4935.63 ms │     no change │
│ QQuery 33    │  6051.64 ms │               5677.56 ms │ +1.07x faster │
│ QQuery 34    │  6305.89 ms │               5969.09 ms │ +1.06x faster │
│ QQuery 35    │  1936.33 ms │               1863.02 ms │     no change │
│ QQuery 36    │    26.21 ms │                 26.35 ms │     no change │
│ QQuery 37    │    26.08 ms │                 25.69 ms │     no change │
│ QQuery 38    │    25.20 ms │                 25.25 ms │     no change │
│ QQuery 39    │    25.50 ms │                 26.15 ms │     no change │
│ QQuery 40    │    26.74 ms │                 27.07 ms │     no change │
│ QQuery 41    │    25.65 ms │                 26.27 ms │     no change │
│ QQuery 42    │    25.35 ms │                 25.87 ms │     no change │
└──────────────┴─────────────┴──────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 88305.52ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 84016.17ms │
│ Average Time (HEAD)                     │  2053.62ms │
│ Average Time (alamb_upgrade_arrow_57.1) │  1953.86ms │
│ Queries Faster                          │         14 │
│ Queries Slower                          │          2 │
│ Queries with No Change                  │         27 │
│ Queries with Failure                    │          0 │
└─────────────────────────────────────────┴────────────┘

@alamb alamb force-pushed the alamb/upgrade_arrow_57.1.0 branch from fafb102 to 191db07 Compare November 21, 2025 13:43
@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate common Related to common crate proto Related to proto crate datasource Changes to the datasource crate labels Nov 21, 2025
@alamb alamb force-pushed the alamb/upgrade_arrow_57.1.0 branch from 191db07 to 5a91551 Compare November 21, 2025 14:33
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Nov 21, 2025
@alamb alamb force-pushed the alamb/upgrade_arrow_57.1.0 branch from 5a91551 to 9eab2c3 Compare November 21, 2025 14:36
| alltypes_plain.parquet | 1851 | 6957 | 2 | page_index=false |
| alltypes_tiny_pages.parquet | 454233 | 267014 | 2 | page_index=true |
| lz4_raw_compressed_larger.parquet | 380836 | 996 | 2 | page_index=false |
| alltypes_plain.parquet | 1851 | 8882 | 2 | page_index=false |
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the size is correct. As @etseidl says "the truth hurts"

/// the filters are applied in the same order as written in the query
pub reorder_filters: bool, default = false

/// (reading) Force the use of RowSelections for filter results, when
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an escape valve if we find some issue when using the new adaptive filter from @hhhizzz

assert_contains!(
&e,
r#"Error during planning: Can not find compatible types to compare Boolean with [Struct("foo": Boolean), Utf8]"#
r#"Error during planning: Can not find compatible types to compare Boolean with [Struct("foo": non-null Boolean), Utf8]"#
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are due the changes from apache/arrow-rs#8648 to clean up datatype display. It is a nice improvement in my mind

// The cache is on by default, and used when filter pushdown is enabled
PredicateCacheTest {
expected_inner_records: 8,
expected_records: 7, // reads more than necessary from the cache as then another bitmap is applied
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this behavior changed due to adaptive filtering. I added a new test that turns off adaptive filtering to show doing so restores the old behavior

@Dandandan
Copy link
Contributor

🤖: Benchmark completed

Details

Comparing HEAD and alamb_upgrade_arrow_57.1.0
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_upgrade_arrow_57.1 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 0     │  2669.04 ms │               2708.77 ms │    no change │
│ QQuery 1     │  1235.58 ms │               1311.33 ms │ 1.06x slower │
│ QQuery 2     │  2388.97 ms │               2475.47 ms │    no change │
│ QQuery 3     │  1206.27 ms │               1200.33 ms │    no change │
│ QQuery 4     │  2326.86 ms │               2244.02 ms │    no change │
│ QQuery 5     │ 28556.87 ms │              28558.86 ms │    no change │
│ QQuery 6     │  4095.23 ms │               3958.01 ms │    no change │
│ QQuery 7     │  3903.82 ms │               3868.19 ms │    no change │
└──────────────┴─────────────┴──────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 46382.65ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 46324.97ms │
│ Average Time (HEAD)                     │  5797.83ms │
│ Average Time (alamb_upgrade_arrow_57.1) │  5790.62ms │
│ Queries Faster                          │          0 │
│ Queries Slower                          │          1 │
│ Queries with No Change                  │          7 │
│ Queries with Failure                    │          0 │
└─────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ alamb_upgrade_arrow_57.1 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.11 ms │                  2.51 ms │  1.19x slower │
│ QQuery 1     │    49.63 ms │                 49.65 ms │     no change │
│ QQuery 2     │   137.94 ms │                134.20 ms │     no change │
│ QQuery 3     │   163.23 ms │                154.87 ms │ +1.05x faster │
│ QQuery 4     │  1087.29 ms │               1111.76 ms │     no change │
│ QQuery 5     │  1490.99 ms │               1535.73 ms │     no change │
│ QQuery 6     │     2.22 ms │                  2.19 ms │     no change │
│ QQuery 7     │    54.53 ms │                 54.27 ms │     no change │
│ QQuery 8     │  1489.20 ms │               1499.13 ms │     no change │
│ QQuery 9     │  1864.60 ms │               1878.47 ms │     no change │
│ QQuery 10    │   375.25 ms │                386.99 ms │     no change │
│ QQuery 11    │   428.15 ms │                440.85 ms │     no change │
│ QQuery 12    │  1369.91 ms │               1379.43 ms │     no change │
│ QQuery 13    │  2122.32 ms │               2132.63 ms │     no change │
│ QQuery 14    │  1291.97 ms │               1313.81 ms │     no change │
│ QQuery 15    │  1261.86 ms │               1267.39 ms │     no change │
│ QQuery 16    │  2719.83 ms │               2737.71 ms │     no change │
│ QQuery 17    │  2710.92 ms │               2742.06 ms │     no change │
│ QQuery 18    │  5919.25 ms │               5077.25 ms │ +1.17x faster │
│ QQuery 19    │   126.87 ms │                120.96 ms │     no change │
│ QQuery 20    │  2104.85 ms │               1933.69 ms │ +1.09x faster │
│ QQuery 21    │  2406.78 ms │               2211.99 ms │ +1.09x faster │
│ QQuery 22    │  4076.54 ms │               3818.94 ms │ +1.07x faster │
│ QQuery 23    │ 12929.23 ms │              12607.37 ms │     no change │
│ QQuery 24    │   211.40 ms │                207.20 ms │     no change │
│ QQuery 25    │   484.28 ms │                477.07 ms │     no change │
│ QQuery 26    │   222.10 ms │                205.52 ms │ +1.08x faster │
│ QQuery 27    │  2839.44 ms │               2746.85 ms │     no change │
│ QQuery 28    │ 23650.44 ms │              23486.62 ms │     no change │
│ QQuery 29    │   970.57 ms │                986.64 ms │     no change │
│ QQuery 30    │  1358.29 ms │               1355.65 ms │     no change │
│ QQuery 31    │  1399.94 ms │               1375.33 ms │     no change │
│ QQuery 32    │  5469.96 ms │               4955.15 ms │ +1.10x faster │
│ QQuery 33    │  6330.13 ms │               5881.95 ms │ +1.08x faster │
│ QQuery 34    │  6616.67 ms │               6409.89 ms │     no change │
│ QQuery 35    │  2074.80 ms │               2052.74 ms │     no change │
│ QQuery 36    │   119.25 ms │                116.62 ms │     no change │
│ QQuery 37    │    51.77 ms │                 51.71 ms │     no change │
│ QQuery 38    │   119.58 ms │                115.50 ms │     no change │
│ QQuery 39    │   199.66 ms │                189.24 ms │ +1.06x faster │
│ QQuery 40    │    44.56 ms │                 42.01 ms │ +1.06x faster │
│ QQuery 41    │    38.38 ms │                 38.30 ms │     no change │
│ QQuery 42    │    31.94 ms │                 31.93 ms │     no change │
└──────────────┴─────────────┴──────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 98418.61ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 95319.78ms │
│ Average Time (HEAD)                     │  2288.80ms │
│ Average Time (alamb_upgrade_arrow_57.1) │  2216.74ms │
│ Queries Faster                          │         10 │
│ Queries Slower                          │          1 │
│ Queries with No Change                  │         32 │
│ Queries with Failure                    │          0 │
└─────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ alamb_upgrade_arrow_57.1 ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 139.21 ms │                130.93 ms │ +1.06x faster │
│ QQuery 2     │  29.26 ms │                 29.13 ms │     no change │
│ QQuery 3     │  38.71 ms │                 34.79 ms │ +1.11x faster │
│ QQuery 4     │  29.41 ms │                 28.65 ms │     no change │
│ QQuery 5     │  87.42 ms │                 88.35 ms │     no change │
│ QQuery 6     │  19.55 ms │                 20.18 ms │     no change │
│ QQuery 7     │ 228.31 ms │                227.00 ms │     no change │
│ QQuery 8     │  34.20 ms │                 32.54 ms │     no change │
│ QQuery 9     │  97.79 ms │                110.99 ms │  1.14x slower │
│ QQuery 10    │  64.27 ms │                 63.16 ms │     no change │
│ QQuery 11    │  17.23 ms │                 17.26 ms │     no change │
│ QQuery 12    │  52.89 ms │                 51.90 ms │     no change │
│ QQuery 13    │  46.75 ms │                 46.20 ms │     no change │
│ QQuery 14    │  14.19 ms │                 13.74 ms │     no change │
│ QQuery 15    │  25.06 ms │                 24.65 ms │     no change │
│ QQuery 16    │  25.08 ms │                 25.22 ms │     no change │
│ QQuery 17    │ 147.82 ms │                153.69 ms │     no change │
│ QQuery 18    │ 307.83 ms │                284.87 ms │ +1.08x faster │
│ QQuery 19    │  37.51 ms │                 38.95 ms │     no change │
│ QQuery 20    │  49.58 ms │                 49.73 ms │     no change │
│ QQuery 21    │ 334.74 ms │                321.67 ms │     no change │
│ QQuery 22    │  20.67 ms │                 20.62 ms │     no change │
└──────────────┴───────────┴──────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                       ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                       │ 1847.47ms │
│ Total Time (alamb_upgrade_arrow_57.1)   │ 1814.23ms │
│ Average Time (HEAD)                     │   83.98ms │
│ Average Time (alamb_upgrade_arrow_57.1) │   82.46ms │
│ Queries Faster                          │         3 │
│ Queries Slower                          │         1 │
│ Queries with No Change                  │        18 │
│ Queries with Failure                    │         0 │
└─────────────────────────────────────────┴───────────┘

It seems to be quite a bit faster even without filter pushdown 🚀

@alamb
Copy link
Contributor Author

alamb commented Nov 22, 2025

It seems to be quite a bit faster even without filter pushdown 🚀

It is like someone has been optimizing low level filter kernels 😆 (but seriously I think major credit is due to you and @rluvaton )

@rluvaton
Copy link
Member

Thank you, I have some more in my sleeve.

@alamb alamb force-pushed the alamb/upgrade_arrow_57.1.0 branch from 9eab2c3 to a81c0d3 Compare November 24, 2025 22:50
@alamb alamb force-pushed the alamb/upgrade_arrow_57.1.0 branch from a81c0d3 to eda1b53 Compare November 24, 2025 22:52
@alamb alamb marked this pull request as ready for review November 24, 2025 22:52
@alamb alamb changed the title [WIP] Update to arrow, parquet 57.1.0 Update to arrow, parquet 57.1.0 Nov 24, 2025
@alamb alamb changed the title Update to arrow, parquet 57.1.0 Update to arrow, parquet to 57.1.0 Nov 24, 2025
[1, 2, 3, 4, 5] [h, e, l, l, o]

# TODO: Enable once arrow_cast supports ListView types.
# TODO: Enable once array_slice supports LargeListView types.
Copy link
Member

@rluvaton rluvaton Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you copy pasted from below, but this test ListView and not LargeListView

Suggested change
# TODO: Enable once array_slice supports LargeListView types.
# TODO: Enable once array_slice supports ListView types.

# ----
# [1, 2, 3, 4, 5] [h, e, l, l, o]
query error DataFusion error: Execution error: Unsupported type 'ListView\(Int64\)'. Must be a supported arrow type name such as 'Int32' or 'Timestamp\(ns\)'. Error unknown token: ListView
query error Failed to coerce arguments to satisfy a call to 'array_slice' function:
Copy link
Member

@rluvaton rluvaton Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might not be related to this PR but now I don't know what arguments are the problematic ones with this change

before I could see it's ListView but now I have no idea what arguments is invalid and what their type that is not supported

Comment on lines +92 to +93
/// Should we force the reader to use RowSelections for filtering
pub force_filter_selections: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of making it an enum instead to allow for future additions without breaking changes?

(that enum should also be non exhaustive to avoid adding a variant a breaking change)

I also see that the with_row_selection_policy already accept enum.

making it an enum also allow to force mask or configure the threshold in the auto policy. this is also useful for testing to force specific path when creating a reproduction test for a bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea -- I was secretly hoping no one would use this flag and added it as an "escape" valve to go back to the arrow 57.0.0 reader behavior

Copy link
Member

@rluvaton rluvaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some comments, other than that LGTM

what do you think of splitting to 2 PRs
one for the actual upgrade and one for the new flag in parquet reader

not because the PR are large but because they are not required to be in the same PR for the upgrade to be made.

If you decide not to, please update the title and the description so the commit message will include that change making it easier to find later

@alamb
Copy link
Contributor Author

alamb commented Nov 26, 2025

what do you think of splitting to 2 PRs one for the actual upgrade and one for the new flag in parquet reader

not because the PR are large but because they are not required to be in the same PR for the upgrade to be made.

I will do this.

In my mind the config flag is required in order to allow people to opt out of the new behavior

@rluvaton
Copy link
Member

In my mind the config flag is required in order to allow people to opt out of the new behavior

so if the behavior changed, we want by default to opt out of it for this release, no? or only for this pr,

@rluvaton
Copy link
Member

if you split the PR it would also be easier for others to create PR with the new arrow version while we discuss this

@alamb
Copy link
Contributor Author

alamb commented Nov 26, 2025

In my mind the config flag is required in order to allow people to opt out of the new behavior

so if the behavior changed, we want by default to opt out of it for this release, no? or only for this pr,

I am not sure

The default behavior of the parquet reader has changed in arrow-rs (in theory it will always be better).

The only usecase I have is adding an "escape valve" so that if someone hit an issue with the new code, there was a way to turn if off without requiring a fork

I don't (yet) have any reason to believe the new behavior isn't always better nor any usecase for tuning the row selector policy from DataFusion

Copy link
Member

@rluvaton rluvaton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'm eager to create PR that using the new version so approving, if you decide to merge without the enum change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation optimizer Optimizer rules proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants