-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove MPI from multi-GPU example #268
Conversation
aa79743
to
8be7e77
Compare
Tested on a laptop with 1 CUDA GPU device with the patch diff --git a/examples/3_MultiGPU/multigpu_infer_fortran.f90 b/examples/3_MultiGPU/multigpu_infer_fortran.f90
index 297844e..cfba096 100644
--- a/examples/3_MultiGPU/multigpu_infer_fortran.f90
+++ b/examples/3_MultiGPU/multigpu_infer_fortran.f90
@@ -27,7 +27,7 @@ program inference
type(torch_tensor), dimension(1) :: out_tensors
! Variables for multi-GPU setup
- integer, parameter :: num_devices = 2
+ integer, parameter :: num_devices = 1
integer :: device_index, i
! Get TorchScript model file as a command line argument
diff --git a/examples/3_MultiGPU/multigpu_infer_python.py b/examples/3_MultiGPU/multigpu_infer_python.py
index 1b49398..c504063 100644
--- a/examples/3_MultiGPU/multigpu_infer_python.py
+++ b/examples/3_MultiGPU/multigpu_infer_python.py
@@ -53,7 +53,7 @@ def deploy(saved_model: str, device: str, batch_size: int = 1) -> torch.Tensor:
if __name__ == "__main__":
saved_model_file = "saved_multigpu_model_cuda.pt"
- for device_index in range(2):
+ for device_index in range(1):
device_to_run = f"cuda:{device_index}"
batch_size_to_run = 1 |
8be7e77
to
88e836e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me @jwallwork23
Happy for merging after a final confirmation of running on GPU.
One small suggestion.
One comment: I wonder if it is worth adding a comment/note about how to change the number of devices you run on, and alongside this altering the multi-infer python top also use a n_devices
variable like you do in the Fortran.
Co-authored-by: Jack Atkinson <[email protected]>
Thanks!
Good suggestion - addressed in 3b67ae8. |
Tested with 2 Ampere GPUs - passes with the latest fix in 4fd8800. Will merge once CI passes. |
* Drop ENABLE_MPI CMake argument * Remove MPI from MultiGPU example * Mention GPU/CUDA as dependency --------- Co-authored-by: Jack Atkinson <[email protected]>
Closes #253.
I decided to close #258 and split it into two separate PRs. Here's the first one that removes MPI from the multi-GPU example. (The second one will introduce a CPU-only example that uses MPI.)
Note that I decided to switch the names of the
multigpu.py
module /MultiGPU
class back tosimplenet.py
/SimpleNet
in this example. This is because the class is a direct copy. (Unlike inMultiIONet
, where it's modified.)