-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Display qualifiers in EXPLAIN #17645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
74ed9ba
to
0abbd5a
Compare
Sometimes we display qualifiers in plan's Display, e.g. for `Column`, but sometimes not. This adds qualifiers to output for `Alias`, `SubqueryAlias` and in `LogicalPlan::display_indent_schema`. Qualifiers are sometimes necessary to understand the plan semantics, especially when dealing with duplicate names, e.g. in joins.
0abbd5a
to
ace656a
Compare
Aggregate: groupBy=[[test.b]], aggr=[[count(Int64(1)) AS count(*)]] [b:UInt32, count(*):Int64] | ||
TableScan: test [a:UInt32, b:UInt32, c:UInt32] | ||
Aggregate: groupBy=[[test.b]], aggr=[[count(Int64(1)) AS count(*)]] [test.b:UInt32, count(*):Int64] | ||
TableScan: test [test.a:UInt32, test.b:UInt32, test.c:UInt32] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the test
already tell us the qualifier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For TableScan, however. the schema printing code is same for every plan node and for many it's not much less clear. Without this change, the plan printout is incomplete and insufficient to understand the plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can special case the schema printing code to have a version to skip the qualifiers in cases where it is always the same 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could that be confusing? If some qualifiers are printed but some not, the projections without qualifiers will look as if they did not have any, which is a different state from the one when they all have the same qualifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more thinking how redundant this line is now
It goes from
- TableScan: test [a:UInt32, b:UInt32, c:UInt32]
+ TableScan: test [test.a:UInt32, test.b:UInt32, test.c:UInt32]
That is the qualifier test
is now repeated 4 times. It will be even worse when there are
- long qualifiers "my_really_obxiously_long_table_name"
- Multiple columns selected as each column gets the same name
For a TableScan, there can be, by definition, only a single relation, so appending the relation name to all expressions just makes the plans harder to read
More generally, when there is only one relation in the query, as is the case in many queries, adding a qualifier to all expressions I think makes the plans harder to read, not better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More generally, when there is only one relation in the query, as is the case in many queries, adding a qualifier to all expressions I think makes the plans harder to read, not better
Agreed.
But also, single-table queries are not the ones we should optimize EXPLAIN output for.
These represent a subset of all queries which naturally is simpler than all queries, without source table count limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @findepi -- I think we should try and avoid adding in unecessary qualifiers (I highlighted places where they aren't necessary) but adding additional qualificiation in where they are ambiguous is a great idea
df_renamed.logical_plan(), | ||
@r" | ||
Projection: t1.c1 AS AAA, t1.c2, t1.c3, t2.c1, t2.c2, t2.c3 | ||
Projection: t1.c1 AS t1.AAA, t1.c2, t1.c3, t2.c1, t2.c2, t2.c3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem right to me -- the alias shouldn't have a qualifier on it, should it? AAA
doesn't come from the t1
relation, it is created in the outer query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I honestly have no idea where t1.
comes from, and what should be here.
TableScan: t2 projection=[a, b, c] [a:UInt32, b:Utf8, c:Int32] | ||
"### | ||
@r" | ||
Projection: t1.a, t2.a, t1.b, t1.c, t2.b, t2.c [t1.a:UInt32, t2.a:UInt32, t1.b:Utf8, t1.c:Int32, t2.b:Utf8, t2.c:Int32] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is an improvement for the Projection
and Inner Join
here to have the qualifiers on them -- that makes them less ambiguous when there are potentially multiple relations
Filter: aggregate_test_100.c2 > Int64(10) [c1:Utf8View, c2:Int8, c3:Int16, c4:Int16, c5:Int32, c6:Int64, c7:Int16, c8:Int32, c9:UInt32, c10:UInt64, c11:Float32, c12:Float64, c13:Utf8View] | ||
TableScan: aggregate_test_100 [c1:Utf8View, c2:Int8, c3:Int16, c4:Int16, c5:Int32, c6:Int64, c7:Int16, c8:Int32, c9:UInt32, c10:UInt64, c11:Float32, c12:Float64, c13:Utf8View] | ||
Projection: aggregate_test_100.c1 [aggregate_test_100.c1:Utf8View] | ||
Filter: aggregate_test_100.c2 > Int64(10) [aggregate_test_100.c1:Utf8View, aggregate_test_100.c2:Int8, aggregate_test_100.c3:Int16, aggregate_test_100.c4:Int16, aggregate_test_100.c5:Int32, aggregate_test_100.c6:Int64, aggregate_test_100.c7:Int16, aggregate_test_100.c8:Int32, aggregate_test_100.c9:UInt32, aggregate_test_100.c10:UInt64, aggregate_test_100.c11:Float32, aggregate_test_100.c12:Float64, aggregate_test_100.c13:Utf8View] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good example of a plan which is much less readable after this change in my mind
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this because all fields are qualified and all have the same qualifier?
I see how it's controversial. Maybe it could go behind a session property. |
Sometimes we display qualifiers in plan's Display, e.g. for
Column
, but sometimes not. This adds qualifiers to output forSubqueryAlias
and inLogicalPlan::display_indent_schema
. Qualifiers are sometimes necessary to understand the plan semantics, especially when dealing with duplicate names, e.g. in joins.