Skip to content

Error projecting statistics in DataSourceExec #14905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #14123
alamb opened this issue Feb 26, 2025 · 2 comments · Fixed by #14685
Closed
Tracked by #14123

Error projecting statistics in DataSourceExec #14905

alamb opened this issue Feb 26, 2025 · 2 comments · Fixed by #14685
Assignees
Labels
bug Something isn't working regression Something that used to work no longer does

Comments

@alamb
Copy link
Contributor

alamb commented Feb 26, 2025

Describe the bug

While working on upgrading Delta.rs to DataFusion 46 I am getting the following error

index out of bounds: the len is 2 but the index is 2

To Reproduce

I am still working on a minimal reproducer. I don't have one yet

Here is one from

Here is the error

/Users/andrewlamb/.cargo/bin/cargo test --color=always --lib operations::merge::tests::test_empty_table_with_schema_merge --profile test --no-fail-fast --config env.RUSTC_BOOTSTRAP=\"1\" --manifest-path /Users/andrewlamb/Software/delta-rs/crates/core/Cargo.toml -- --format=json --exact -Z unstable-options --show-output
Testing started at 9:59 AM ...
warning: profiles for the non root package will be ignored, specify profiles at the workspace root:
package:   /Users/andrewlamb/Software/delta-rs/python/Cargo.toml
workspace: /Users/andrewlamb/Software/delta-rs/Cargo.toml
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.23s
     Running unittests src/lib.rs (target/debug/deps/deltalake_core-b6362ef613c3cec4)

index out of bounds: the len is 2 but the index is 2
thread 'operations::merge::tests::test_empty_table_with_schema_merge' panicked at /Users/andrewlamb/Software/datafusion2/datafusion/physical-plan/src/projection.rs:291:36:
index out of bounds: the len is 2 but the index is 2
stack backtrace:
   0: rust_begin_unwind
             at /rustc/e71f9a9a98b0faf423844bf0ba7438f29dc27d58/library/std/src/panicking.rs:665:5
   1: core::panicking::panic_fmt
             at /rustc/e71f9a9a98b0faf423844bf0ba7438f29dc27d58/library/core/src/panicking.rs:76:14
   2: core::panicking::panic_bounds_check
             at /rustc/e71f9a9a98b0faf423844bf0ba7438f29dc27d58/library/core/src/panicking.rs:281:5
   3: index<datafusion_common::stats::ColumnStatistics>
             at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/index.rs:274:10
   4: index<datafusion_common::stats::ColumnStatistics, usize>
             at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/slice/index.rs:16:9
   5: index<datafusion_common::stats::ColumnStatistics, usize, alloc::alloc::Global>
             at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/alloc/src/vec/mod.rs:3346:9
   6: stats_projection<core::iter::adapters::map::Map<core::slice::iter::Iter<(alloc::sync::Arc<dyn datafusion_physical_expr_common::physical_expr::PhysicalExpr, alloc::alloc::Global>, alloc::string::String)>, datafusion_physical_plan::projection::{impl#2}::statistics::{closure_env#0}>>
             at /Users/andrewlamb/Software/datafusion2/datafusion/physical-plan/src/projection.rs:291:36
   7: statistics
             at /Users/andrewlamb/Software/datafusion2/datafusion/physical-plan/src/projection.rs:234:12
   8: should_swap_join_order
             at /Users/andrewlamb/Software/datafusion2/datafusion/physical-optimizer/src/join_selection.rs:69:23
   9: statistical_join_selection_subrule
             at /Users/andrewlamb/Software/datafusion2/datafusion/physical-optimizer/src/join_selection.rs:329:28
  10: {closure#1}
             at /Users/andrewlamb/Software/datafusion2/datafusion/physical-optimizer/src/join_selection.rs:187:17
  11: call_once<(alloc::sync::Arc<dyn datafusion_physical_plan::execution_plan::ExecutionPlan, alloc::alloc::Global>), datafusion_physical_optimizer::join_selection::{impl#1}::optimize::{closure_env#1}>
             at /Users/andrewlamb/.rustup/toolchains/stable-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/ops/function.rs:305:13
  12: transform_parent<alloc::sync::Arc<dyn datafusion_physical_plan::execution_plan::ExecutionPlan, alloc::alloc::Global>, &mut datafusion_physical_optimizer::join_selection::{impl#1}::optimize::{closure_env#1}>
             at /Users/andrewlamb/Software/datafusion2/datafusion/common/src/tree_node.rs:763:44
  13: {closure#0}<alloc::sync::Arc<dyn datafusion_physical_plan::execution_plan::ExecutionPlan, alloc::alloc::Global>, datafusion_physical_optimizer::join_selection::{impl#1}::optimize::{closure_env#1}>
             at /Users/andrewlamb/Software/datafusion2/datafusion/common/src/tree_node.rs:264:13
 
error: test failed, to rerun pass `--lib`
error: 1 target failed:
    `--lib`

The full trace is here:

full_trace.txt

Expected behavior

Error should not happen

Additional context

Likely introduced as part of

@alamb alamb added the bug Something isn't working label Feb 26, 2025
@alamb alamb added the regression Something that used to work no longer does label Feb 26, 2025
@alamb alamb self-assigned this Feb 26, 2025
@alamb
Copy link
Contributor Author

alamb commented Feb 27, 2025

I tried @blaginin 's fix in the following PR and it also fixes this isse

@berkaysynnada
Copy link
Contributor

I tried @blaginin 's fix in the following PR and it also fixes this isse

That's exactly what I was going to suggest 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression Something that used to work no longer does
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants