Skip to content

Commit 8dce716

Browse files
Modify hash join explain analyze description (#20523)
1 parent 8c3ce41 commit 8dce716

File tree

1 file changed

+44
-0
lines changed

1 file changed

+44
-0
lines changed

Diff for: sql-statements/sql-statement-explain-analyze.md

+44
Original file line numberDiff line numberDiff line change
@@ -201,6 +201,10 @@ inner:{total:4.429220003s, concurrency:5, task:17, construct:96.207725ms, fetch:
201201

202202
### HashJoin
203203

204+
The HashJoin operator has two versions: HashJoinV1 and HashJoinV2. You can specify the desired version using the [`tidb_hash_join_version`](/system-variables.md#tidb_hash_join_version-new-in-v840) system variable. The following sections describe the execution process of each version respectively.
205+
206+
#### HashJoinv1
207+
204208
The `HashJoin` operator has an inner worker, an outer worker, and N join workers. The detailed execution process is as follows:
205209

206210
1. The inner worker reads inner table rows and constructs a hash table.
@@ -226,6 +230,46 @@ build_hash_table:{total:146.071334ms, fetch:110.338509ms, build:35.732825ms}, pr
226230
- `probe`: The total time consumed for joining with outer table rows and the hash table.
227231
- `fetch`: The total time that the join worker waits to read the outer table rows data.
228232

233+
#### HashJoinv2
234+
235+
The `HashJoin` operator has one fetcher, N row table builders, and N hash table builders on the build side, and has one fetcher and N workers on the probe side. The detailed execution process is as follows:
236+
237+
1. The fetcher on the build side reads data from the downstream executor and dispatches data to each row table builder.
238+
2. Each row table builder receives data chunks, splits them into several partitions, and builds row tables.
239+
3. The process waits until all row tables are built.
240+
4. Hash table builders build hash tables using row tables.
241+
5. The fetcher on the probe side reads data from the downstream executor and dispatches it to workers.
242+
6. After receiving data, workers look up hash tables, build the final results, and dispatch the results to the result channel.
243+
7. The main thread of `HashJoin` retrieves the join results from the result channel.
244+
245+
The `HashJoin` operator contains the following execution information:
246+
247+
```
248+
build_hash_table:{concurrency:5, time:2.25s, fetch:1.06s, max_partition:1.06s, total_partition:5.27s, max_build:124ms, total_build:439.5ms}, probe:{concurrency:5, time:13s, fetch_and_wait:3.03s, max_worker_time:13s, total_worker_time:1m4.5s, max_probe:9.93s, total_probe:49.4s, probe_collision:59818971}, spill:{round:1, spilled_partition_num_per_round:[5/8], total_spill_GiB_per_round:[1.64], build_spill_row_table_GiB_per_round:[0.50], build_spill_hash_table_per_round:[0.12]}
249+
```
250+
251+
- `build_hash_table`: The execution information of reading data from the downstream operator and building hash tables.
252+
- `time`: The total time consumption of building hash tables.
253+
- `fetch`: The total time spent reading data from the downstream.
254+
- `max_partition`: The longest execution time among all row table builders.
255+
- `total_partition`: The total execution time taken by all row table builders.
256+
- `max_build`: The longest execution time among all hash table builders.
257+
- `total_build`: The total execution time taken by all hash table builders.
258+
- `probe`: The execution information of reading data from the downstream operator and performing probe operations.
259+
- `time`: The total time consumption of probing.
260+
- `fetch_and_wait`: The total time spent reading data from downstream and waiting for the data to be received by the upstream.
261+
- `max_worker_time`: The longest execution time among all workers, including reading data from downstream, executing probe operations, and waiting for the data received by the upstream.
262+
- `total_worker_time`: The total execution time of all workers.
263+
- `max_probe`: The longest probe time among all workers.
264+
- `total_probe`: The total probing time of all workers.
265+
- `probe_collision`: The number of hash collisions encountered during probing.
266+
- `spill`: The execution information during the spill.
267+
- `round`: The number of spill rounds.
268+
- `spilled_partition_num_per_round`: The number of spilled partitions per round, formatted as `x/y`, where `x` is the number of spilled partitions and `y` is the total number of partitions.
269+
- `total_spill_GiB_per_round`: The total size of data written into the disk in each spill round.
270+
- `build_spill_row_table_GiB_per_round`: The total size of row table data written into the disk in each spill round on the build side.
271+
- `build_spill_hash_table_per_round`: The total size of hash table data written into the disk in each spill round on the build side.
272+
229273
### TableFullScan (TiFlash)
230274

231275
The `TableFullScan` operator executed on a TiFlash node contains the following execution information:

0 commit comments

Comments
 (0)