Skip to content

Conversation

findepi
Copy link
Member

@findepi findepi commented Sep 18, 2025

Sometimes we display qualifiers in plan's Display, e.g. for Column, but sometimes not. This adds qualifiers to output for SubqueryAlias and in LogicalPlan::display_indent_schema. Qualifiers are sometimes necessary to understand the plan semantics, especially when dealing with duplicate names, e.g. in joins.

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Sep 18, 2025
@findepi findepi force-pushed the findepi/explain-qualifiers branch from 74ed9ba to 0abbd5a Compare September 18, 2025 18:58
@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Sep 18, 2025
Sometimes we display qualifiers in plan's Display, e.g. for `Column`,
but sometimes not. This adds qualifiers to output for `Alias`,
`SubqueryAlias` and in `LogicalPlan::display_indent_schema`. Qualifiers
are sometimes necessary to understand the plan semantics, especially
when dealing with duplicate names, e.g. in joins.
@findepi findepi force-pushed the findepi/explain-qualifiers branch from 0abbd5a to ace656a Compare September 18, 2025 19:23
@github-actions github-actions bot added the substrait Changes to the substrait crate label Sep 18, 2025
Aggregate: groupBy=[[test.b]], aggr=[[count(Int64(1)) AS count(*)]] [b:UInt32, count(*):Int64]
TableScan: test [a:UInt32, b:UInt32, c:UInt32]
Aggregate: groupBy=[[test.b]], aggr=[[count(Int64(1)) AS count(*)]] [test.b:UInt32, count(*):Int64]
TableScan: test [test.a:UInt32, test.b:UInt32, test.c:UInt32]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the test already tell us the qualifier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For TableScan, however. the schema printing code is same for every plan node and for many it's not much less clear. Without this change, the plan printout is incomplete and insufficient to understand the plan.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can special case the schema printing code to have a version to skip the qualifiers in cases where it is always the same 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could that be confusing? If some qualifiers are printed but some not, the projections without qualifiers will look as if they did not have any, which is a different state from the one when they all have the same qualifier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more thinking how redundant this line is now

It goes from

-          TableScan: test [a:UInt32, b:UInt32, c:UInt32]
+          TableScan: test [test.a:UInt32, test.b:UInt32, test.c:UInt32]

That is the qualifier test is now repeated 4 times. It will be even worse when there are

  1. long qualifiers "my_really_obxiously_long_table_name"
  2. Multiple columns selected as each column gets the same name

For a TableScan, there can be, by definition, only a single relation, so appending the relation name to all expressions just makes the plans harder to read

More generally, when there is only one relation in the query, as is the case in many queries, adding a qualifier to all expressions I think makes the plans harder to read, not better



Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More generally, when there is only one relation in the query, as is the case in many queries, adding a qualifier to all expressions I think makes the plans harder to read, not better

Agreed.
But also, single-table queries are not the ones we should optimize EXPLAIN output for.
These represent a subset of all queries which naturally is simpler than all queries, without source table count limit.

@findepi findepi requested review from xudong963 and alamb and removed request for alamb and xudong963 September 22, 2025 16:47
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @findepi -- I think we should try and avoid adding in unecessary qualifiers (I highlighted places where they aren't necessary) but adding additional qualificiation in where they are ambiguous is a great idea

df_renamed.logical_plan(),
@r"
Projection: t1.c1 AS AAA, t1.c2, t1.c3, t2.c1, t2.c2, t2.c3
Projection: t1.c1 AS t1.AAA, t1.c2, t1.c3, t2.c1, t2.c2, t2.c3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me -- the alias shouldn't have a qualifier on it, should it? AAA doesn't come from the t1 relation, it is created in the outer query

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly have no idea where t1. comes from, and what should be here.

TableScan: t2 projection=[a, b, c] [a:UInt32, b:Utf8, c:Int32]
"###
@r"
Projection: t1.a, t2.a, t1.b, t1.c, t2.b, t2.c [t1.a:UInt32, t2.a:UInt32, t1.b:Utf8, t1.c:Int32, t2.b:Utf8, t2.c:Int32]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is an improvement for the Projection and Inner Join here to have the qualifiers on them -- that makes them less ambiguous when there are potentially multiple relations

Filter: aggregate_test_100.c2 > Int64(10) [c1:Utf8View, c2:Int8, c3:Int16, c4:Int16, c5:Int32, c6:Int64, c7:Int16, c8:Int32, c9:UInt32, c10:UInt64, c11:Float32, c12:Float64, c13:Utf8View]
TableScan: aggregate_test_100 [c1:Utf8View, c2:Int8, c3:Int16, c4:Int16, c5:Int32, c6:Int64, c7:Int16, c8:Int32, c9:UInt32, c10:UInt64, c11:Float32, c12:Float64, c13:Utf8View]
Projection: aggregate_test_100.c1 [aggregate_test_100.c1:Utf8View]
Filter: aggregate_test_100.c2 > Int64(10) [aggregate_test_100.c1:Utf8View, aggregate_test_100.c2:Int8, aggregate_test_100.c3:Int16, aggregate_test_100.c4:Int16, aggregate_test_100.c5:Int32, aggregate_test_100.c6:Int64, aggregate_test_100.c7:Int16, aggregate_test_100.c8:Int32, aggregate_test_100.c9:UInt32, aggregate_test_100.c10:UInt64, aggregate_test_100.c11:Float32, aggregate_test_100.c12:Float64, aggregate_test_100.c13:Utf8View]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a good example of a plan which is much less readable after this change in my mind

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this because all fields are qualified and all have the same qualifier?

@findepi
Copy link
Member Author

findepi commented Sep 26, 2025

I see how it's controversial. Maybe it could go behind a session property.

@findepi findepi closed this Sep 26, 2025
@findepi findepi deleted the findepi/explain-qualifiers branch September 26, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) substrait Changes to the substrait crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants