Add back hello-numpy-sag and update references (#2816)

* add back hello-numpy-sag and update references * reformat notebook
NVIDIA · Aug 21, 2024 · d7d97fd · d7d97fd
1 parent b9c1c29
commit d7d97fd
Show file tree

Hide file tree

Showing 8 changed files with 363 additions and 25 deletions.
diff --git a/examples/README.md b/examples/README.md
@@ -76,11 +76,11 @@ When you open a notebook, select the kernel `nvflare_example` using the dropdown
 | Example                                                                                                                                | Framework    | Summary                                                                                                                                                         |
 |----------------------------------------------------------------------------------------------------------------------------------------|--------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | [Notebook for Hello Examples](./hello-world/hello_world.ipynb)                                                                         | -            | Notebook for examples below.                                                                                                                                    |
-| [Hello Scatter and Gather](./hello-world/hello-numpy-sag/README.md)                                                                    | Numpy        | Example using [ScatterAndGather](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.scatter_and_gather.html) controller workflow.      |
-| [Hello Cross-Site Validation](./hello-world/hello-numpy-cross-val/README.md)                                                           | Numpy        | Example using [CrossSiteModelEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_model_eval.html) controller workflow, and example using previous results without training workflow. |
+| [Hello FedAvg NumPy](./hello-world/hello-fedavg-numpy/README.md)                                                                    | Numpy        | Example using [FedAvg](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.fedavg.html) controller workflow.      |
+| [Hello Cross-Site Validation](./hello-world/hello-cross-val/README.md)                                                           | Numpy        | Example using [CrossSiteEval](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cross_site_eval.html) controller workflow, and example using previous results without training workflow. |
 | [Hello Cyclic Weight Transfer](./hello-world/hello-cyclic/README.md)                                                                   | PyTorch      | Example using [CyclicController](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.cyclic_ctl.html) controller workflow to implement [Cyclic Weight Transfer](https://pubmed.ncbi.nlm.nih.gov/29617797/). |
 | [Hello PyTorch](./hello-world/hello-pt/README.md)                                                                                      | PyTorch      | Example using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [PyTorch](https://pytorch.org/) as the deep learning training framework. |
-| [Hello TensorFlow](./hello-world/hello-tf2/README.md)                                                                                  | TensorFlow2  | Example of using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [TensorFlow](https://tensorflow.org/) as the deep learning training framework. |
+| [Hello TensorFlow](./hello-world/hello-tf/README.md)                                                                                  | TensorFlow  | Example of using an image classifier using [FedAvg](https://arxiv.org/abs/1602.05629) and [TensorFlow](https://tensorflow.org/) as the deep learning training framework. |
 
 ## 2. Step-by-Step Examples
 | Example | Dataset | Controller-Type | Execution API Type | Framework | Summary |

diff --git a/examples/hello-world/hello-numpy-sag/README.md b/examples/hello-world/hello-numpy-sag/README.md
@@ -0,0 +1,32 @@
+# Hello Numpy Scatter and Gather
+
+"[Scatter and Gather](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.scatter_and_gather.html)" is the standard workflow to implement Federated Averaging ([FedAvg](https://arxiv.org/abs/1602.05629)). 
+This workflow follows the hub and spoke model for communicating the global model to each client for local training (i.e., "scattering") and aggregates the result to perform the global model update (i.e., "gathering").  
+
+> **_NOTE:_** This example uses a Numpy-based trainer and will generate its data within the code.
+
+You can follow the [hello_world notebook](../hello_world.ipynb) or the following:
+
+### 1. Install NVIDIA FLARE
+
+Follow the [Installation](https://nvflare.readthedocs.io/en/main/quickstart.html) instructions.
+
+### 2. Run the experiment
+
+Use nvflare simulator to run the hello-examples:
+
+```
+nvflare simulator -w /tmp/nvflare/hello-numpy-sag -n 2 -t 2 hello-world/hello-numpy-sag/jobs/hello-numpy-sag
+```
+
+### 3. Access the logs and results
+
+You can find the running logs and results inside the simulator's workspace/simulate_job
+
+```bash
+$ ls /tmp/nvflare/hello-numpy-sag/simulate_job/
+app_server  app_site-1  app_site-2  log.txt  model  models
+
+```
+
+For how to use the FLARE API to run this app, see [this notebook](hello_numpy_sag.ipynb).
diff --git a/examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb b/examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb
@@ -0,0 +1,207 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "e129ede5",
+   "metadata": {},
+   "source": [
+    "   # Hello Numpy SAG"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9bf7e391",
+   "metadata": {},
+   "source": [
+    "In this notebook, Hello Numpy SAG is run with the FLARE API to execute commands for submitting the job and following along to see the progress."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bbca0050",
+   "metadata": {},
+   "source": [
+    "### 1. Install NVIDIA FLARE\n",
+    "\n",
+    "Follow the [Installation](https://nvflare.readthedocs.io/en/main/getting_started.html#installation) instructions to set up an environment that has NVIDIA FLARE installed if you do not have one already. You will need an environment to run a provisioned FL system."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e5d7e675",
+   "metadata": {},
+   "source": [
+    "### 2. Provision and Start FL System\n",
+    "\n",
+    "In the rest of this example, we assume that 'nvflare provision' has been run in a workspace (set to '/workspace' below, but you can change this to the location you run provision from) to set up a project named `hello-example` with a server and two clients. Feel free to use an existing provisioned NVFLARE project if you have that available, or to try things out, you could set up and start a system in [POC mode](https://nvflare.readthedocs.io/en/main/getting_started.html#setting-up-the-application-environment-in-poc-mode).\n",
+    "\n",
+    "Use the 'start.sh' scripts to start the server and clients in seperate terminals to start the system."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fe3165d",
+   "metadata": {},
+   "source": [
+    "\n",
+    "### 3. Connect to the FL System with the FLARE API\n",
+    "\n",
+    "Use `new_secure_session()` to initiate a session connecting to the FL Server with the FLARE API. The necessary arguments are the username of the admin user you are using and the corresponding startup kit location.\n",
+    "\n",
+    "In the code example below, we get the `admin_user_dir` by concatenating the workspace root with the default directories that are created if you provision a project with a given project name. You can change the values to what applies to your system if needed.\n",
+    "\n",
+    "Note that if debug mode is not enabled, there is no output after initiating a session successfully, so instead we print the output of `get_system_info()`. If you are unable to connect and initiate a session, make sure that your FL Server is running and that the configurations are correct with the right path to the admin startup kit directory."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c3dbde69",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "from nvflare.fuel.flare_api.flare_api import new_secure_session\n",
+    "\n",
+    "project_name = \"example_project\"\n",
+    "username = \"[email protected]\"\n",
+    "workspace_root = \"/tmp/nvflare/poc\"\n",
+    "admin_user_dir = os.path.join(workspace_root, project_name, \"prod_00\", username)\n",
+    "\n",
+    "sess = new_secure_session(\n",
+    "    username=username,\n",
+    "    startup_kit_location=admin_user_dir\n",
+    ")\n",
+    "print(sess.get_system_info())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "405edb37",
+   "metadata": {},
+   "source": [
+    "### 4. Submit the Job with the FLARE API\n",
+    "\n",
+    "With a session successfully connected, you can use `submit_job()` to submit your job. You can change `path_to_example_job` to the location of the job you are submitting. If your session is not active, go back to the previous step and connect with a session.\n",
+    "\n",
+    "With POC command, we link the examples to the following directory ``` /tmp/nvflare/poc/example_project/prod_00/[email protected]/transfer```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b3589b60-434b-4b6d-97bc-74e95bbc7b52",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "ls -l /tmp/nvflare/poc/example_project/prod_00/[email protected]/transfer\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c8f08cef",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "path_to_example_job = \"hello-world/hello-numpy-sag/jobs/hello-numpy-sag\"\n",
+    "job_id = sess.submit_job(path_to_example_job)\n",
+    "print(job_id + \" was submitted\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "42317cf3",
+   "metadata": {},
+   "source": [
+    "### 5. After Submitting the Job\n",
+    "\n",
+    "You should be able to see the output in the terminals where you are running your FL Server and Clients when you submitted the job. You can also use `monitor_job()` to follow along and give you updates on the progress until the job is done.\n",
+    "\n",
+    "By default, `monitor_job()` only has one required arguement, the `job_id` of the job you are waiting for, and the default behavior is to wait until the job is complete before returning a Return Code of `JOB_FINISHED`.\n",
+    "\n",
+    "In order to follow along and see a more meaningful result, the following cell contains the `basic_cb_with_print` callback that keeps track of the number of times the callback is run and prints the `job_meta` the first three times and the final time before `monitor_job()` completes with every other call just printing a dot to save output space. This callback improves the output and is just an example of what can be done with additional arguments and the `job_meta` information of the job that is being monitored."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "03fd93d0",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "from nvflare.fuel.flare_api.flare_api import Session, basic_cb_with_print\n",
+    "\n",
+    "\n",
+    "sess.monitor_job(job_id, cb=basic_cb_with_print, cb_run_counter={\"count\":0})"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "31ccb6a6",
+   "metadata": {},
+   "source": [
+    "### 6. Shutting Down the FL System\n",
+    "\n",
+    "As of now, there is no specific FLARE API command for shutting down the FL system, but the FLARE API can use the `do_command()` function of the underlying AdminAPI to submit any commands that the FLARE Console supports including shutdown commands to the clients and server:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b0d8aa9c",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "print(sess.api.do_command(\"shutdown client\"))\n",
+    "print(sess.api.do_command(\"shutdown server\"))\n",
+    "\n",
+    "sess.close()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "331c0ba2-8abe-47a3-a864-18dcb7489a44",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "nvflare_example",
+   "language": "python",
+   "name": "nvflare_example"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.18"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/app/config/config_fed_client.json b/examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/app/config/config_fed_client.json
@@ -0,0 +1,17 @@
+{
+  "format_version": 2,
+  "executors": [
+    {
+      "tasks": [
+        "train"
+      ],
+      "executor": {
+        "path": "nvflare.app_common.np.np_trainer.NPTrainer",
+        "args": {}
+      }
+    }
+  ],
+  "task_result_filters": [],
+  "task_data_filters": [],
+  "components": []
+}
diff --git a/examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/app/config/config_fed_server.json b/examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/app/config/config_fed_server.json
@@ -0,0 +1,48 @@
+{
+  "format_version": 2,
+  "server": {
+    "heart_beat_timeout": 600
+  },
+  "task_data_filters": [],
+  "task_result_filters": [],
+  "components": [
+    {
+      "id": "persistor",
+      "path": "nvflare.app_common.np.np_model_persistor.NPModelPersistor",
+      "args": {}
+    },
+    {
+      "id": "shareable_generator",
+      "path": "nvflare.app_common.shareablegenerators.full_model_shareable_generator.FullModelShareableGenerator",
+      "args": {}
+    },
+    {
+      "id": "aggregator",
+      "path": "nvflare.app_common.aggregators.intime_accumulate_model_aggregator.InTimeAccumulateWeightedAggregator",
+      "args": {
+        "expected_data_kind": "WEIGHTS",
+        "aggregation_weights": {
+          "site-1": 1.0,
+          "site-2": 1.0
+        }
+      }
+    }
+  ],
+  "workflows": [
+    {
+      "id": "scatter_and_gather",
+      "path": "nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather",
+      "args": {
+        "min_clients": 2,
+        "num_rounds": 3,
+        "start_round": 0,
+        "wait_time_after_min_received": 10,
+        "aggregator_id": "aggregator",
+        "persistor_id": "persistor",
+        "shareable_generator_id": "shareable_generator",
+        "train_task_name": "train",
+        "train_timeout": 6000
+      }
+    }
+  ]
+}
diff --git a/examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/meta.json b/examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/meta.json
@@ -0,0 +1,10 @@
+{
+  "name": "hello-numpy-sag",
+  "resource_spec": {},
+  "min_clients" : 2,
+  "deploy_map": {
+    "app": [
+      "@ALL"
+    ]
+  }
+}
diff --git a/examples/hello-world/hello-numpy-sag/requirements.txt b/examples/hello-world/hello-numpy-sag/requirements.txt
@@ -0,0 +1 @@
+nvflare~=2.4.0rc