Skip to content

Commit c2cd9cc

Browse files
author
harry
committed
first commit
1 parent c2bb939 commit c2cd9cc

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+4650
-2
lines changed

Dockerfile

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
FROM nvcr.io/nvidia/pytorch:23.11-py3
2+
3+
WORKDIR /
4+
5+
# change the download source of apt, comment it out if you are abroad
6+
COPY sources.list /etc/apt/sources.list
7+
RUN apt-get update && \
8+
apt-get install -y openssh-server vim curl inetutils-ping net-tools telnet lsof
9+
10+
COPY start.sh /start.sh
11+
COPY sshd_config /etc/ssh/sshd_config
12+
COPY nccl-tests /nccl-tests
13+
14+
CMD ["/bin/bash", "start.sh"]

README.md

Lines changed: 73 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,73 @@
1-
# build-nccl-tests-with-pytorch
2-
This is a dockerfile to build PyTorch executing NCCL-Tests.
1+
# Build-NCCL-Tests-With-PyTorch
2+
3+
![license](https://img.shields.io/hexpm/l/plug.svg)
4+
[![docker](https://img.shields.io/docker/pulls/mayooot/nccl-tests-with-pytorch.svg)](https://hub.docker.com/r/mayooot/nccl-tests-with-pytorch)
5+
6+
# Overview
7+
8+
Build [NCCL-Tests](https://github.com/NVIDIA/nccl-tests) and configure SSHD in PyTorch container to help you test NCCL
9+
faster!
10+
11+
PyTorch Version: 23.11
12+
13+
# Quick Start
14+
15+
~~~shell
16+
docker pull mayooot/nccl-tests-with-pytorch:v0.0.1
17+
~~~
18+
19+
# Build From Source
20+
21+
~~~shell
22+
git clone https://github.com/mayooot/build-nccl-tests-with-pytorch
23+
cd build-nccl-tests-with-pytorch
24+
25+
docker build -t nccl-tests-with-pytorch:latest .
26+
~~~
27+
28+
# Usage
29+
30+
The default values for `PORT` and `PASS` are 12345, you can replace them with `-e`.
31+
32+
In addition, you need to mount the host's `id_rsa` and `id_rsa.pub` to the container.
33+
34+
~~~shell
35+
docker run --name foo \
36+
-d -it \
37+
--network=host \
38+
-e PORT=1998 -e PASS=P@88w0rd \
39+
-v /tmp/id_rsa:/root/.ssh/id_rsa \
40+
-v /tmp/id_rsa.pub:/root/.ssh/id_rsa.pub \
41+
--gpus all --shm-size=1g \
42+
--cap-add=IPC_LOCK --device=/dev/infiniband \
43+
mayooot/nccl-tests-with-pytorch:v0.0.1
44+
~~~
45+
46+
The code and executable for NCCL-Tests is located in `/nccl-tests`, so let me show you how to use it,
47+
using `all_reduce_perf` as an example.
48+
49+
Before using `all_reduce_perf`, you need to configure SSH intercommunication.
50+
51+
~~~shell
52+
ssh-copy-id -p 1998 root@all_cluster_ip
53+
~~~
54+
55+
Please replace `--host cluster_ip1,cluster_ip2,...` to the real cluster's IP address.
56+
57+
~~~shell
58+
docker exec -it foo bash
59+
60+
cd /nccl-tests
61+
62+
mpirun --allow-run-as-root \
63+
-mca plm_rsh_args "-p 1998" \
64+
-x NCCL_DEBUG=INFO \
65+
-x NCCL_IB_HCA=mlx5_10,mlx5_11,mlx5_12,mlx5_13,mlx5_14,mlx5_15,mlx5_16,mlx5_17 \
66+
--host cluster_ip1,cluster_ip2,... \
67+
./build/all_reduce_perf \
68+
-b 1G -e 4G -f 2 -g 8
69+
~~~
70+
71+
# Contribute
72+
73+
Feel free to open issues and pull requests. Any feedback is highly appreciated!

nccl-tests/LICENSE.txt

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
2+
Copyright (c) 2016-2017, NVIDIA CORPORATION. All rights reserved.
3+
4+
Redistribution and use in source and binary forms, with or without
5+
modification, are permitted provided that the following conditions
6+
are met:
7+
* Redistributions of source code must retain the above copyright
8+
notice, this list of conditions and the following disclaimer.
9+
* Redistributions in binary form must reproduce the above copyright
10+
notice, this list of conditions and the following disclaimer in the
11+
documentation and/or other materials provided with the distribution.
12+
* Neither the name of NVIDIA CORPORATION, nor the names of their
13+
contributors may be used to endorse or promote products derived
14+
from this software without specific prior written permission.
15+
16+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+

nccl-tests/Makefile

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
#
2+
# Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
3+
#
4+
# See LICENCE.txt for license information
5+
#
6+
7+
BUILDDIR ?= build
8+
override BUILDDIR := $(abspath $(BUILDDIR))
9+
10+
.PHONY: all clean
11+
12+
default: src.build
13+
14+
TARGETS=src
15+
16+
all: ${TARGETS:%=%.build}
17+
clean: ${TARGETS:%=%.clean}
18+
19+
%.build:
20+
${MAKE} -C $* build BUILDDIR=${BUILDDIR}
21+
22+
%.clean:
23+
${MAKE} -C $* clean BUILDDIR=${BUILDDIR}

nccl-tests/README.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# NCCL Tests
2+
3+
These tests check both the performance and the correctness of [NCCL](http://github.com/nvidia/nccl) operations.
4+
5+
## Build
6+
7+
To build the tests, just type `make`.
8+
9+
If CUDA is not installed in /usr/local/cuda, you may specify CUDA\_HOME. Similarly, if NCCL is not installed in /usr, you may specify NCCL\_HOME.
10+
11+
```shell
12+
$ make CUDA_HOME=/path/to/cuda NCCL_HOME=/path/to/nccl
13+
```
14+
15+
NCCL tests rely on MPI to work on multiple processes, hence multiple nodes. If you want to compile the tests with MPI support, you need to set MPI=1 and set MPI\_HOME to the path where MPI is installed.
16+
17+
```shell
18+
$ make MPI=1 MPI_HOME=/path/to/mpi CUDA_HOME=/path/to/cuda NCCL_HOME=/path/to/nccl
19+
```
20+
21+
## Usage
22+
23+
NCCL tests can run on multiple processes, multiple threads, and multiple CUDA devices per thread. The number of process is managed by MPI and is therefore not passed to the tests as argument. The total number of ranks (=CUDA devices) will be equal to (number of processes)\*(number of threads)\*(number of GPUs per thread).
24+
25+
### Quick examples
26+
27+
Run on 8 GPUs (`-g 8`), scanning from 8 Bytes to 128MBytes :
28+
```shell
29+
$ ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8
30+
```
31+
32+
Run with MPI on 10 processes (potentially on multiple nodes) with 4 GPUs each, for a total of 40 GPUs:
33+
```shell
34+
$ mpirun -np 10 ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 4
35+
```
36+
37+
### Performance
38+
39+
See the [Performance](doc/PERFORMANCE.md) page for explanation about numbers, and in particular the "busbw" column.
40+
41+
### Arguments
42+
43+
All tests support the same set of arguments :
44+
45+
* Number of GPUs
46+
* `-t,--nthreads <num threads>` number of threads per process. Default : 1.
47+
* `-g,--ngpus <GPUs per thread>` number of gpus per thread. Default : 1.
48+
* Sizes to scan
49+
* `-b,--minbytes <min size in bytes>` minimum size to start with. Default : 32M.
50+
* `-e,--maxbytes <max size in bytes>` maximum size to end at. Default : 32M.
51+
* Increments can be either fixed or a multiplication factor. Only one of those should be used
52+
* `-i,--stepbytes <increment size>` fixed increment between sizes. Default : 1M.
53+
* `-f,--stepfactor <increment factor>` multiplication factor between sizes. Default : disabled.
54+
* NCCL operations arguments
55+
* `-o,--op <sum/prod/min/max/avg/all>` Specify which reduction operation to perform. Only relevant for reduction operations like Allreduce, Reduce or ReduceScatter. Default : Sum.
56+
* `-d,--datatype <nccltype/all>` Specify which datatype to use. Default : Float.
57+
* `-r,--root <root/all>` Specify which root to use. Only for operations with a root like broadcast or reduce. Default : 0.
58+
* Performance
59+
* `-n,--iters <iteration count>` number of iterations. Default : 20.
60+
* `-w,--warmup_iters <warmup iteration count>` number of warmup iterations (not timed). Default : 5.
61+
* `-m,--agg_iters <aggregation count>` number of operations to aggregate together in each iteration. Default : 1.
62+
* `-a,--average <0/1/2/3>` Report performance as an average across all ranks (MPI=1 only). <0=Rank0,1=Avg,2=Min,3=Max>. Default : 1.
63+
* Test operation
64+
* `-p,--parallel_init <0/1>` use threads to initialize NCCL in parallel. Default : 0.
65+
* `-c,--check <check iteration count>` perform count iterations, checking correctness of results on each iteration. This can be quite slow on large numbers of GPUs. Default : 1.
66+
* `-z,--blocking <0/1>` Make NCCL collective blocking, i.e. have CPUs wait and sync after each collective. Default : 0.
67+
* `-G,--cudagraph <num graph launches>` Capture iterations as a CUDA graph and then replay specified number of times. Default : 0.
68+
69+
## Copyright
70+
71+
NCCL tests are provided under the BSD license. All source code and accompanying documentation is copyright (c) 2016-2021, NVIDIA CORPORATION. All rights reserved.
72+

nccl-tests/build/all_gather_perf

12.6 MB
Binary file not shown.

nccl-tests/build/all_reduce_perf

12.6 MB
Binary file not shown.

nccl-tests/build/alltoall_perf

12.6 MB
Binary file not shown.

nccl-tests/build/broadcast_perf

12.6 MB
Binary file not shown.

nccl-tests/build/gather_perf

12.6 MB
Binary file not shown.

0 commit comments

Comments
 (0)