Add notes about running TF with GPUs (#2348)

NVIDIA · Feb 2, 2024 · 899f11e · 899f11e
1 parent 8c10f56
commit 899f11e
Show file tree

Hide file tree

Showing 3 changed files with 76 additions and 9 deletions.
diff --git a/examples/hello-world/hello-cyclic/README.md b/examples/hello-world/hello-cyclic/README.md
@@ -27,8 +27,8 @@ bash ./prepare_data.sh
 
 Use nvflare simulator to run the hello-examples:
 
-```
-nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 hello-cyclic/jobs/hello-cyclic
+```bash
+nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 ./jobs/hello-cyclic
 ```
 
 ### 3. Access the logs and results
@@ -40,3 +40,27 @@ $ ls /tmp/nvflare/simulate_job/
 app_server  app_site-1  app_site-2  log.txt
 
 ```
+
+### 4. Notes on running with GPUs
+
+For running with GPUs, we recommend using
+[NVIDIA TensorFlow docker](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow)
+
+If you choose to run the example using GPUs, it is important to note that,
+by default, TensorFlow will attempt to allocate all available GPU memory at the start.
+In scenarios where multiple clients are involved, you have a couple of options to address this.
+
+One approach is to include specific flags to prevent TensorFlow from allocating all GPU memory.
+For instance:
+
+```bash
+TF_FORCE_GPU_ALLOW_GROWTH=true nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 ./jobs/hello-cyclic
+```
+
+If you possess more GPUs than clients,
+an alternative strategy is to run one client on each GPU.
+This can be achieved as illustrated below:
+
+```bash
+TF_FORCE_GPU_ALLOW_GROWTH=true nvflare simulator -w /tmp/nvflare/ -n 2 -gpu 0,1 ./jobs/hello-cyclic
+```
diff --git a/examples/hello-world/hello-tf2/README.md b/examples/hello-world/hello-tf2/README.md
@@ -26,10 +26,10 @@ Prepare the data first:
 bash ./prepare_data.sh
 ```
 
-Use nvflare simulator to run the hello-examples: (TF2 does not allow multiple processes to be running on a single GPU at the same time. Need to set the simulator threads to 1. "-gpu" option can be used to run multiple concurrent clients.)
+Use nvflare simulator to run the hello-examples:
 
-```
-nvflare simulator -w /tmp/nvflare/ -n 2 -t 1 hello-tf2/jobs/hello-tf2
+```bash
+nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 ./jobs/hello-tf2
 ```
 
 ### 3. Access the logs and results
@@ -41,3 +41,27 @@ $ ls /tmp/nvflare/simulate_job/
 app_server  app_site-1  app_site-2  log.txt
 
 ```
+
+### 4. Notes on running with GPUs
+
+For running with GPUs, we recommend using
+[NVIDIA TensorFlow docker](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow)
+
+If you choose to run the example using GPUs, it is important to note that,
+by default, TensorFlow will attempt to allocate all available GPU memory at the start.
+In scenarios where multiple clients are involved, you have a couple of options to address this.
+
+One approach is to include specific flags to prevent TensorFlow from allocating all GPU memory.
+For instance:
+
+```bash
+TF_FORCE_GPU_ALLOW_GROWTH=true nvflare simulator -w /tmp/nvflare/ -n 2 -t 2 ./jobs/hello-tf2
+```
+
+If you possess more GPUs than clients,
+an alternative strategy is to run one client on each GPU.
+This can be achieved as illustrated below:
+
+```bash
+TF_FORCE_GPU_ALLOW_GROWTH=true nvflare simulator -w /tmp/nvflare/ -n 2 -gpu 0,1 ./jobs/hello-tf2
+```
diff --git a/examples/hello-world/ml-to-fl/tf/README.md b/examples/hello-world/ml-to-fl/tf/README.md
@@ -31,8 +31,7 @@ nvflare job list_templates
 \* depends on whether TF can found GPU or not
 
 
-Note that for running with GPUs, we recommend using [NVIDIA TensorFlow docker](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tensorflow)
-
+For running with GPUs, please check the [note](#notes-on-running-with-gpus)
 
 ## Transform CIFAR10 TensorFlow training code to FL with NVFLARE Client API
 
@@ -108,7 +107,27 @@ Then we can run the job using the simulator:
 
 ```bash
 bash ./prepare_data.sh
-TF_GPU_ALLOCATOR=cuda_malloc_async nvflare simulator -n 2 -t 2 ./jobs/tensorflow_multi_gpu -w tensorflow_multi_gpu_workspace
+nvflare simulator -n 2 -t 2 ./jobs/tensorflow_multi_gpu -w tensorflow_multi_gpu_workspace
 ```
 
-Note that the flag "TF_GPU_ALLOCATOR=cuda_malloc_async" is only needed if you are going to run more than one process in the same GPU.
+## Notes on running with GPUs
+
+
+If you choose to run the example using GPUs, it is important to note that,
+by default, TensorFlow will attempt to allocate all available GPU memory at the start.
+In scenarios where multiple clients are involved, you have a couple of options to address this.
+
+One approach is to include specific flags to prevent TensorFlow from allocating all GPU memory.
+For instance:
+
+```bash
+TF_FORCE_GPU_ALLOW_GROWTH=true TF_GPU_ALLOCATOR=cuda_malloc_async nvflare simulator -n 2 -t 2 ./jobs/tensorflow_multi_gpu -w tensorflow_multi_gpu_workspace
+```
+
+If you possess more GPUs than clients,
+an alternative strategy is to run one client on each GPU.
+This can be achieved as illustrated below:
+
+```bash
+nvflare simulator -n 2 -gpu 0,1 ./jobs/tensorflow_multi_gpu -w tensorflow_multi_gpu_workspace
+```