-
Notifications
You must be signed in to change notification settings - Fork 182
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add back hello-numpy-sag and update references (#2816)
* add back hello-numpy-sag and update references * reformat notebook
- Loading branch information
Showing
8 changed files
with
363 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Hello Numpy Scatter and Gather | ||
|
||
"[Scatter and Gather](https://nvflare.readthedocs.io/en/main/apidocs/nvflare.app_common.workflows.scatter_and_gather.html)" is the standard workflow to implement Federated Averaging ([FedAvg](https://arxiv.org/abs/1602.05629)). | ||
This workflow follows the hub and spoke model for communicating the global model to each client for local training (i.e., "scattering") and aggregates the result to perform the global model update (i.e., "gathering"). | ||
|
||
> **_NOTE:_** This example uses a Numpy-based trainer and will generate its data within the code. | ||
You can follow the [hello_world notebook](../hello_world.ipynb) or the following: | ||
|
||
### 1. Install NVIDIA FLARE | ||
|
||
Follow the [Installation](https://nvflare.readthedocs.io/en/main/quickstart.html) instructions. | ||
|
||
### 2. Run the experiment | ||
|
||
Use nvflare simulator to run the hello-examples: | ||
|
||
``` | ||
nvflare simulator -w /tmp/nvflare/hello-numpy-sag -n 2 -t 2 hello-world/hello-numpy-sag/jobs/hello-numpy-sag | ||
``` | ||
|
||
### 3. Access the logs and results | ||
|
||
You can find the running logs and results inside the simulator's workspace/simulate_job | ||
|
||
```bash | ||
$ ls /tmp/nvflare/hello-numpy-sag/simulate_job/ | ||
app_server app_site-1 app_site-2 log.txt model models | ||
|
||
``` | ||
|
||
For how to use the FLARE API to run this app, see [this notebook](hello_numpy_sag.ipynb). |
207 changes: 207 additions & 0 deletions
207
examples/hello-world/hello-numpy-sag/hello_numpy_sag.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,207 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e129ede5", | ||
"metadata": {}, | ||
"source": [ | ||
" # Hello Numpy SAG" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "9bf7e391", | ||
"metadata": {}, | ||
"source": [ | ||
"In this notebook, Hello Numpy SAG is run with the FLARE API to execute commands for submitting the job and following along to see the progress." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "bbca0050", | ||
"metadata": {}, | ||
"source": [ | ||
"### 1. Install NVIDIA FLARE\n", | ||
"\n", | ||
"Follow the [Installation](https://nvflare.readthedocs.io/en/main/getting_started.html#installation) instructions to set up an environment that has NVIDIA FLARE installed if you do not have one already. You will need an environment to run a provisioned FL system." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "e5d7e675", | ||
"metadata": {}, | ||
"source": [ | ||
"### 2. Provision and Start FL System\n", | ||
"\n", | ||
"In the rest of this example, we assume that 'nvflare provision' has been run in a workspace (set to '/workspace' below, but you can change this to the location you run provision from) to set up a project named `hello-example` with a server and two clients. Feel free to use an existing provisioned NVFLARE project if you have that available, or to try things out, you could set up and start a system in [POC mode](https://nvflare.readthedocs.io/en/main/getting_started.html#setting-up-the-application-environment-in-poc-mode).\n", | ||
"\n", | ||
"Use the 'start.sh' scripts to start the server and clients in seperate terminals to start the system." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "6fe3165d", | ||
"metadata": {}, | ||
"source": [ | ||
"\n", | ||
"### 3. Connect to the FL System with the FLARE API\n", | ||
"\n", | ||
"Use `new_secure_session()` to initiate a session connecting to the FL Server with the FLARE API. The necessary arguments are the username of the admin user you are using and the corresponding startup kit location.\n", | ||
"\n", | ||
"In the code example below, we get the `admin_user_dir` by concatenating the workspace root with the default directories that are created if you provision a project with a given project name. You can change the values to what applies to your system if needed.\n", | ||
"\n", | ||
"Note that if debug mode is not enabled, there is no output after initiating a session successfully, so instead we print the output of `get_system_info()`. If you are unable to connect and initiate a session, make sure that your FL Server is running and that the configurations are correct with the right path to the admin startup kit directory." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "c3dbde69", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"import os\n", | ||
"from nvflare.fuel.flare_api.flare_api import new_secure_session\n", | ||
"\n", | ||
"project_name = \"example_project\"\n", | ||
"username = \"[email protected]\"\n", | ||
"workspace_root = \"/tmp/nvflare/poc\"\n", | ||
"admin_user_dir = os.path.join(workspace_root, project_name, \"prod_00\", username)\n", | ||
"\n", | ||
"sess = new_secure_session(\n", | ||
" username=username,\n", | ||
" startup_kit_location=admin_user_dir\n", | ||
")\n", | ||
"print(sess.get_system_info())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "405edb37", | ||
"metadata": {}, | ||
"source": [ | ||
"### 4. Submit the Job with the FLARE API\n", | ||
"\n", | ||
"With a session successfully connected, you can use `submit_job()` to submit your job. You can change `path_to_example_job` to the location of the job you are submitting. If your session is not active, go back to the previous step and connect with a session.\n", | ||
"\n", | ||
"With POC command, we link the examples to the following directory ``` /tmp/nvflare/poc/example_project/prod_00/[email protected]/transfer```" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b3589b60-434b-4b6d-97bc-74e95bbc7b52", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"ls -l /tmp/nvflare/poc/example_project/prod_00/[email protected]/transfer\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "c8f08cef", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"path_to_example_job = \"hello-world/hello-numpy-sag/jobs/hello-numpy-sag\"\n", | ||
"job_id = sess.submit_job(path_to_example_job)\n", | ||
"print(job_id + \" was submitted\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "42317cf3", | ||
"metadata": {}, | ||
"source": [ | ||
"### 5. After Submitting the Job\n", | ||
"\n", | ||
"You should be able to see the output in the terminals where you are running your FL Server and Clients when you submitted the job. You can also use `monitor_job()` to follow along and give you updates on the progress until the job is done.\n", | ||
"\n", | ||
"By default, `monitor_job()` only has one required arguement, the `job_id` of the job you are waiting for, and the default behavior is to wait until the job is complete before returning a Return Code of `JOB_FINISHED`.\n", | ||
"\n", | ||
"In order to follow along and see a more meaningful result, the following cell contains the `basic_cb_with_print` callback that keeps track of the number of times the callback is run and prints the `job_meta` the first three times and the final time before `monitor_job()` completes with every other call just printing a dot to save output space. This callback improves the output and is just an example of what can be done with additional arguments and the `job_meta` information of the job that is being monitored." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "03fd93d0", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"from nvflare.fuel.flare_api.flare_api import Session, basic_cb_with_print\n", | ||
"\n", | ||
"\n", | ||
"sess.monitor_job(job_id, cb=basic_cb_with_print, cb_run_counter={\"count\":0})" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"id": "31ccb6a6", | ||
"metadata": {}, | ||
"source": [ | ||
"### 6. Shutting Down the FL System\n", | ||
"\n", | ||
"As of now, there is no specific FLARE API command for shutting down the FL system, but the FLARE API can use the `do_command()` function of the underlying AdminAPI to submit any commands that the FLARE Console supports including shutdown commands to the clients and server:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "b0d8aa9c", | ||
"metadata": { | ||
"tags": [] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"print(sess.api.do_command(\"shutdown client\"))\n", | ||
"print(sess.api.do_command(\"shutdown server\"))\n", | ||
"\n", | ||
"sess.close()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"id": "331c0ba2-8abe-47a3-a864-18dcb7489a44", | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "nvflare_example", | ||
"language": "python", | ||
"name": "nvflare_example" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.8.18" | ||
}, | ||
"vscode": { | ||
"interpreter": { | ||
"hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6" | ||
} | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 5 | ||
} |
17 changes: 17 additions & 0 deletions
17
examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/app/config/config_fed_client.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"format_version": 2, | ||
"executors": [ | ||
{ | ||
"tasks": [ | ||
"train" | ||
], | ||
"executor": { | ||
"path": "nvflare.app_common.np.np_trainer.NPTrainer", | ||
"args": {} | ||
} | ||
} | ||
], | ||
"task_result_filters": [], | ||
"task_data_filters": [], | ||
"components": [] | ||
} |
48 changes: 48 additions & 0 deletions
48
examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/app/config/config_fed_server.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
{ | ||
"format_version": 2, | ||
"server": { | ||
"heart_beat_timeout": 600 | ||
}, | ||
"task_data_filters": [], | ||
"task_result_filters": [], | ||
"components": [ | ||
{ | ||
"id": "persistor", | ||
"path": "nvflare.app_common.np.np_model_persistor.NPModelPersistor", | ||
"args": {} | ||
}, | ||
{ | ||
"id": "shareable_generator", | ||
"path": "nvflare.app_common.shareablegenerators.full_model_shareable_generator.FullModelShareableGenerator", | ||
"args": {} | ||
}, | ||
{ | ||
"id": "aggregator", | ||
"path": "nvflare.app_common.aggregators.intime_accumulate_model_aggregator.InTimeAccumulateWeightedAggregator", | ||
"args": { | ||
"expected_data_kind": "WEIGHTS", | ||
"aggregation_weights": { | ||
"site-1": 1.0, | ||
"site-2": 1.0 | ||
} | ||
} | ||
} | ||
], | ||
"workflows": [ | ||
{ | ||
"id": "scatter_and_gather", | ||
"path": "nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather", | ||
"args": { | ||
"min_clients": 2, | ||
"num_rounds": 3, | ||
"start_round": 0, | ||
"wait_time_after_min_received": 10, | ||
"aggregator_id": "aggregator", | ||
"persistor_id": "persistor", | ||
"shareable_generator_id": "shareable_generator", | ||
"train_task_name": "train", | ||
"train_timeout": 6000 | ||
} | ||
} | ||
] | ||
} |
10 changes: 10 additions & 0 deletions
10
examples/hello-world/hello-numpy-sag/jobs/hello-numpy-sag/meta.json
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
{ | ||
"name": "hello-numpy-sag", | ||
"resource_spec": {}, | ||
"min_clients" : 2, | ||
"deploy_map": { | ||
"app": [ | ||
"@ALL" | ||
] | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
nvflare~=2.4.0rc |
Oops, something went wrong.