From f66ca0247ed60cb8578a2e531245061f0bc8f7a4 Mon Sep 17 00:00:00 2001 From: hongtaozhang Date: Thu, 21 Nov 2024 16:44:24 -0800 Subject: [PATCH] Fix typo. --- docs/user-tutorial/benchmarks/micro-benchmarks.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/user-tutorial/benchmarks/micro-benchmarks.md b/docs/user-tutorial/benchmarks/micro-benchmarks.md index c8408a041..9445d47b3 100644 --- a/docs/user-tutorial/benchmarks/micro-benchmarks.md +++ b/docs/user-tutorial/benchmarks/micro-benchmarks.md @@ -412,11 +412,11 @@ Measures bandwidth and latency for various memcpy patterns across different link | device_to_device_bidirectional_memcpy_write_ce_sum_bw | GB/s | Sum of the output matrix | | all_to_host_memcpy_ce_cpu[0-9]_gpu[0-9]_bw | GB/s | Measures bandwidth of cuMemcpyAsync between a single device and the host while simultaneously running copies from all other devices to the host. | | all_to_host_memcpy_ce_sum_bw | GB/s | Sum of the output matrix | -| all_to_host_bidirectional_memcpy_ce_cpu[0-9]_gpu[0-9]_bw | GB/s | A device to host copy is measured while a host to device copy is run simultaneously. Only the device to host copy bandwidth is reported. All other devices generate simultaneous host to device and device to host interferring traffic. | +| all_to_host_bidirectional_memcpy_ce_cpu[0-9]_gpu[0-9]_bw | GB/s | A device to host copy is measured while a host to device copy is run simultaneously. Only the device to host copy bandwidth is reported. All other devices generate simultaneous host to device and device to host interfering traffic. | | all_to_host_bidirectional_memcpy_ce_sum_bw | GB/s | Sum of the output matrix | | host_to_all_memcpy_ce_cpu[0-9]_gpu[0-9]_bw | GB/s | Measures bandwidth of cuMemcpyAsync between the host to a single device while simultaneously running copies from the host to all other devices. | | host_to_all_memcpy_ce_sum_bw | GB/s | Sum of the output matrix | -| host_to_all_bidirectional_memcpy_ce_cpu[0-9]_gpu[0-9]_bw | GB/s | A host to device copy is measured while a device to host copy is run simultaneously. Only the host to device copy bandwidth is reported. All other devices generate simultaneous host to device and device to host interferring traffic. | +| host_to_all_bidirectional_memcpy_ce_cpu[0-9]_gpu[0-9]_bw | GB/s | A host to device copy is measured while a device to host copy is run simultaneously. Only the host to device copy bandwidth is reported. All other devices generate simultaneous host to device and device to host interfering traffic. | | host_to_all_bidirectional_memcpy_ce_sum_bw | GB/s | Sum of the output matrix | | all_to_one_write_ce_gpu[0-9]_gpu[0-9]_bw | GB/s | Measures the total bandwidth of copies from all accessible peers to a single device, for each device. Bandwidth is reported as the total inbound bandwidth for each device. Write tests launch a copy from the target device to the peer using the target's context. | | all_to_one_write_ce_sum_bw | GB/s | Sum of the output matrix | @@ -440,11 +440,11 @@ Measures bandwidth and latency for various memcpy patterns across different link | device_to_device_bidirectional_memcpy_write_sm_sum_bw | GB/s | Sum of the output matrix | | all_to_host_memcpy_sm_cpu[0-9]_gpu[0-9]_bw | GB/s | Measures bandwidth of a copy kernel between a single device and the host while simultaneously running copies from all other devices to the host. | | all_to_host_memcpy_sm_sum_bw | GB/s | Sum of the output matrix | -| all_to_host_bidirectional_memcpy_sm_cpu[0-9]_gpu[0-9]_bw | GB/s | A device to host bandwidth of a copy kernel is measured while a host to device copy is run simultaneously. Only the device to host copy bandwidth is reported. All other devices generate simultaneous host to device and device to host interferring traffic using copy kernels. | +| all_to_host_bidirectional_memcpy_sm_cpu[0-9]_gpu[0-9]_bw | GB/s | A device to host bandwidth of a copy kernel is measured while a host to device copy is run simultaneously. Only the device to host copy bandwidth is reported. All other devices generate simultaneous host to device and device to host interfering traffic using copy kernels. | | all_to_host_bidirectional_memcpy_sm_sum_bw | GB/s | Sum of the output matrix | | host_to_all_memcpy_sm_cpu[0-9]_gpu[0-9]_bw | GB/s | Measures bandwidth of a copy kernel between the host to a single device while simultaneously running copies from the host to all other devices. | | host_to_all_memcpy_sm_sum_bw | GB/s | Sum of the output matrix | -| host_to_all_bidirectional_memcpy_sm_cpu[0-9]_gpu[0-9]_bw | GB/s | A host to device bandwidth of a copy kernel is measured while a device to host copy is run simultaneously. Only the host to device copy bandwidth is reported. All other devices generate simultaneous host to device and device to host interferring traffic using copy kernels. | +| host_to_all_bidirectional_memcpy_sm_cpu[0-9]_gpu[0-9]_bw | GB/s | A host to device bandwidth of a copy kernel is measured while a device to host copy is run simultaneously. Only the host to device copy bandwidth is reported. All other devices generate simultaneous host to device and device to host interfering traffic using copy kernels. | | host_to_all_bidirectional_memcpy_sm_sum_bw | GB/s | Sum of the output matrix | | all_to_one_write_sm_gpu[0-9]_gpu[0-9]_bw | GB/s | Measures the total bandwidth of copies from all accessible peers to a single device, for each device. Bandwidth is reported as the total inbound bandwidth for each device. Write tests launch a copy from the target device to the peer using the target's context. | | all_to_one_write_sm_sum_bw | GB/s | Sum of the output matrix |