Benchmark modal script revamped - profiling and external libraries (Issue #504) #510

vyom1611 · 2024-06-01T13:47:31Z

Basically added support for libraries like cuDNN on modal so that people can run using libraries on cloud gpus. Basically added my own custom docker image for cuda environment setup and also added support for profiling using nsight systems which comes pre-installed. Based on issue #504

Features:

cuDNN supported
openMPI supported
Nsight Systems supported and now by attaching a volume, you can download the report to view it on your local machine in GUI form
instructions in the benchmark_on_modal.py script on how to use the volumes/download reports and existing errors.
-(Please not that profile_gpt2cu.py does not work with this and you have to manually change the command to run nsys profile)

Examples:
training the gpt2 model with cuDNN use:
GPU_MEM=80 modal run dev/cuda/benchmark_on_modal.py
--compile-command "make train_gpt2cu USE_CUDNN=1"
--run-command "./train_gpt2cu -i dev/data/tinyshakespeare/tiny_shakespeare_train.bin -j dev/data/tinyshakespeare/tiny_shakespeare_val.bin -v 250 -s 250 -g 144 -f shakespeare.log -b 4"

For profiling using nsight system:
GPU_MEM=80 modal run dev/cuda/benchmark_on_modal.py
--compile-command "make train_gpt2cu USE_CUDNN=1"
--run-command "nsys profile ./train_gpt2cu -i dev/data/tinyshakespeare/tiny_shakespeare_train.bin -j dev/data/tinyshakespeare/tiny_shakespeare_val.bin -v 250 -s 250 -g 144 -f shakespeare.log -b 4"

NOTE: Currently there is a bug in the profiling using nsight system which produces a lot of errors on the command line but it
does not actually interfere with the model training and validation. The report (that you download) is still generated and can be viewed from Nsight Systems. Additionally, 'profile_gpt2cu.py' does not work with this too.

Basically enables more features to be used on the cloud gpus. Please feel free to add-on/optimize this script so that more folks can explore the codebase and test out all kinds of features and profiling the codebase has to offer currently.

vyom1611 · 2024-06-03T11:31:18Z

I am still working on a newer base image for this so that the nsight errors disappear but meanwhile it should be working for most of it

karpathy · 2024-06-04T03:44:48Z

Makefile

@@ -14,6 +14,7 @@ CUDA_OUTPUT_FILE = -o $@
 # NVCC flags
 # -t=0 is short for --threads, 0 = number of CPUs on the machine
 NVCC_FLAGS = -O3 -t=0 --use_fast_math
+NVCC_FLAGS += --std=c++17


I'm not 100% sure what the repercussions would be here right now, is it needed for the modal script?

Currently yes, it prints out 100s of lines of errors in the terminal without that (I believe its because of the docker image I created having the newest version of cuda), but the current script will not work without that.

What version of GCC are you using?

As in on my container? This is what I got from using 'gcc --version':
command_args = ['gcc', '--version']
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0

I used a cuda image from docker hub as a base, but the current image I am using is this:

https://hub.docker.com/layers/totallyvyom/cuda-env/latest/images/sha256-a273f155a5ad9dcd4dd366a6da2f48f80f10b675bd136ab350f73b695d23b23b?tab=layers

and this is the cuda image I am layering on:

https://hub.docker.com/layers/nvidia/cuda/12.4.1-cudnn-devel-ubuntu20.04/images/sha256-f18cf1a9ac2842e59f13b0d0729594da8cbd68cadd2379308cdd98c0374dbd80?context=explore

That helps - it's probably (I'm guessing) the gcc compiler on ubuntu 20.04 is too old. Try finding a container which has a newer version of ubuntu like 22.04. That should fix your compilation issues.

I am messing around with images for newer versions to also fix the errors I was getting with nsight, but it takes while because reloading all the containers after every little detail takes so much time, it is such a hassle to build a cuda dev env on a docker 😅 but I am trying to solve those right now

Yep, your right on that. Good luck!

vyom1611 · 2024-06-04T18:10:50Z

I just fixed the gcc versioning mistake, and cuDNN should also work with the modal script now.

Benchmark modal script fixed - profiling and cuDNN (Issue #504 and PR #510 fixes)

Benchmark modal script revamped - profiling and external libraries

e0abfae

vyom1611 closed this Jun 3, 2024

vyom1611 force-pushed the master branch from e0abfae to 08fc45b Compare June 3, 2024 11:18

vyom1611 added 2 commits June 3, 2024 16:48

Merge remote-tracking branch 'origin/master'

0867653

Benchmark modal script revamped - profiling and external libraries

fbc1e49

vyom1611 reopened this Jun 3, 2024

Makefile updated to match script

9b540fa

karpathy reviewed Jun 4, 2024

View reviewed changes

vyom1611 and others added 3 commits June 4, 2024 23:39

Merge branch 'karpathy:master' into master

453fde7

Modal script updated for cudnn and gcc version match

4564a24

Merge remote-tracking branch 'origin/master'

ab8b040

Update Makefile - revert back

1e8efe9

vyom1611 requested a review from karpathy June 7, 2024 06:43

vyom1611 and others added 4 commits June 7, 2024 14:20

Merge branch 'karpathy:master' into master

c24ef4a

Most updated working modal script

58c5906

Merge remote-tracking branch 'origin/master'

009be4d

missing parameter in comments added

ecb13ea

vyom1611 requested a review from rosslwheeler June 10, 2024 08:24

vyom1611 added 2 commits June 10, 2024 13:54

Merge branch 'karpathy:master' into master

8a3a29e

Merge branch 'karpathy:master' into master

afa7978

vyom1611 marked this pull request as draft June 12, 2024 19:00

vyom1611 marked this pull request as ready for review June 12, 2024 19:00

vyom1611 closed this Jun 12, 2024

karpathy added a commit that referenced this pull request Jun 13, 2024

Merge pull request #582 from vyom1611/master

747a579

Benchmark modal script fixed - profiling and cuDNN (Issue #504 and PR #510 fixes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark modal script revamped - profiling and external libraries (Issue #504) #510

Benchmark modal script revamped - profiling and external libraries (Issue #504) #510

vyom1611 commented Jun 1, 2024 •

edited

Loading

vyom1611 commented Jun 3, 2024

karpathy Jun 4, 2024

vyom1611 Jun 4, 2024

rosslwheeler Jun 4, 2024

vyom1611 Jun 4, 2024 •

edited

Loading

rosslwheeler Jun 4, 2024 •

edited

Loading

vyom1611 Jun 4, 2024 •

edited

Loading

rosslwheeler Jun 4, 2024

vyom1611 commented Jun 4, 2024

Benchmark modal script revamped - profiling and external libraries (Issue #504) #510

Benchmark modal script revamped - profiling and external libraries (Issue #504) #510

Conversation

vyom1611 commented Jun 1, 2024 • edited Loading

vyom1611 commented Jun 3, 2024

karpathy Jun 4, 2024

Choose a reason for hiding this comment

vyom1611 Jun 4, 2024

Choose a reason for hiding this comment

rosslwheeler Jun 4, 2024

Choose a reason for hiding this comment

vyom1611 Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

rosslwheeler Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

vyom1611 Jun 4, 2024 • edited Loading

Choose a reason for hiding this comment

rosslwheeler Jun 4, 2024

Choose a reason for hiding this comment

vyom1611 commented Jun 4, 2024

vyom1611 commented Jun 1, 2024 •

edited

Loading

vyom1611 Jun 4, 2024 •

edited

Loading

rosslwheeler Jun 4, 2024 •

edited

Loading

vyom1611 Jun 4, 2024 •

edited

Loading