-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature](datalake) Add BucketShuffleJoin support for bucketed hive tables #27784
base: master
Are you sure you want to change the base?
[feature](datalake) Add BucketShuffleJoin support for bucketed hive tables #27784
Conversation
0488e3b
to
c5e23b2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
clang-tidy review says "All clean, LGTM! 👍" |
b4464d4
to
f9e42ab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-tidy made some suggestions
ed212e1
to
eaf29b0
Compare
clang-tidy review says "All clean, LGTM! 👍" |
Hi @Nitin-Kashyap , thanks for your contribution. |
BTW, is it only suitable for "spark created" hive bucket table? |
@morningman Please find the sample test I used for this case: - CREATE TABLE parquet_test (
user_id INT,
key VARCHAR(20),
part VARCAHAR(10)
)
USING parquet
PARTITIONED BY (part)
CLUSTERED BY (user_id) INTO 3 BUCKETS;
INSERT INTO parquet_test2 VALUES (31, 'U31', 'IN'), (11,'U11','IN'), (21, 'U21', 'IN');
|
@morningman Yes, for current scope it will understand only Spark created bucketed table, it identifies this by Properties defined by spark for bucket specification. I plan to take up supporting for Hive, Hudi as well in some time (hopefully in next PR); for this I have left a place holder THashType [HIVE_MOD: Hive and Hudi use the same hash method] however for hudi some more changes on FE side need to do for identifing type bucket id from file path. |
eaf29b0
to
34c701c
Compare
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
34c701c
to
d25350a
Compare
clang-tidy review says "All clean, LGTM! 👍" |
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/catalog/external/HMSExternalTable.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/planner/external/HiveScanNode.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/planner/DataPartition.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/planner/DistributedPlanner.java
Outdated
Show resolved
Hide resolved
fe/fe-core/src/main/java/org/apache/doris/planner/external/FileQueryScanNode.java
Outdated
Show resolved
Hide resolved
d25350a
to
28039b8
Compare
clang-tidy review says "All clean, LGTM! 👍" |
c76ddc5
to
40431d1
Compare
40431d1
to
843b9af
Compare
clang-tidy review says "All clean, LGTM! 👍" |
73123f0
to
4091dd6
Compare
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
4091dd6
to
5e0d2a0
Compare
clang-tidy review says "All clean, LGTM! 👍" |
b53b7e0
to
5c27041
Compare
clang-tidy review says "All clean, LGTM! 👍" |
1 similar comment
clang-tidy review says "All clean, LGTM! 👍" |
run buildall |
5c27041
to
4a57ca3
Compare
clang-tidy review says "All clean, LGTM! 👍" |
4a57ca3
to
471a7c5
Compare
clang-tidy review says "All clean, LGTM! 👍" |
471a7c5
to
714534c
Compare
clang-tidy review says "All clean, LGTM! 👍" |
714534c
to
10db37d
Compare
clang-tidy review says "All clean, LGTM! 👍" |
10db37d
to
7784db9
Compare
clang-tidy review says "All clean, LGTM! 👍" |
… generated by Spark. (27783) 1. Original planner updated to consider BucketShuffle for bucketed hive table 2. Neerids planner updated for bucketShuffle join on hive tables. 3. Added spark style hash calculation in BE for shuffle on one side. 4. Added shuffle hash selection based on left(non-shuffling) side.
7784db9
to
3780f23
Compare
You should support the |
Add BucketShuffleJoin support for bucketed hive tables generated by Spark. (27783)
Proposed changes
Issue Number: close #27783
###Sample Output:s