Update streaming example (#3195)

Fixes # . ### Description Convert job templates to job API, add end-to-end example with memory comparison, update Readme ### Types of changes  - [x] Non-breaking change (fix or new feature that would not break existing functionality). - [ ] Breaking change (fix or new feature that would cause existing functionality to change). - [ ] New tests added to cover the changes. - [ ] Quick tests passed locally by running `./runtest.sh`. - [ ] In-line docstrings updated. - [ ] Documentation updated.
NVIDIA · Jan 30, 2025 · b26b224 · b26b224
1 parent 9e7cbfb
commit b26b224
Show file tree

Hide file tree

Showing 23 changed files with 531 additions and 188 deletions.
diff --git a/examples/advanced/streaming/README.md b/examples/advanced/streaming/README.md
@@ -1,68 +1,99 @@
-# Object Streaming Examples
+# Object Streaming
 
 ## Overview
-The examples here demonstrate how to use object streamers to send large file/objects memory efficiently.
+The examples here demonstrate how to use object streamers to send large objects in a memory-efficient manner.
 
-The object streamer uses less memory because it sends files by chunks (default chunk size is 1MB) and 
-it sends containers entry by entry.
+Current default setting is to send and receive large objects in full, so extra memory will be needed and allocated to hold the received message. 
+This works fine when the message is small, but can become a limit when model size is large, e.g. for large language models.
 
-For example, if you have a dict with 10 1GB entries, it will take 10GB extra space to send the dict without
-streaming. It only requires extra 1GB to serialize the entry using streaming.
+To save on memory usage, we can stream the message send / receive: when sending large objects (e.g. a dict),
+streamer sends containers entry by entry (e.g. one dict item each time); further, if we save the object to a file, 
+streamer can send the file by chunks (default chunk size is 1MB).
 
+Thus, the memory demand can be reduced to the size of the largest entry for container streaming; while nearly no extra memory is needed for file
+streaming. For example, if sending a dict with 10 1GB entries, without streaming, it will take 10GB extra space to send the dict. 
+With container streaming, it only requires extra 1GB; and if saved to a file before sending, it only requires 1MB extra space to send the file.
+
+All examples are run with NVFlare Simulator via [JobAPI](https://nvflare.readthedocs.io/en/main/programming_guide/fed_job_api.html).
 ## Concepts
 
 ### Object Streamer
-
-ObjectStreamer is a base class to stream an object piece by piece. The `StreamableEngine` built in the NVFlare can
+ObjectStreamer is the base class to stream an object piece by piece. The `StreamableEngine` built in the NVFlare can
 stream any implementations of ObjectSteamer
 
-Following implementations are included in NVFlare,
+The following implementations are included in NVFlare,
 
-* `FileStreamer`: It can be used to stream a file
-* `ContainerStreamer`: This class can stream a container entry by entry. Currently, dict, list and set are supported
+* `ContainerStreamer`: This class is used to stream a container entry by entry. Currently, dict, list and set are supported
+* `FileStreamer`: This class is used to stream a file
 
-The container streamer can only stream the top level entries. All the sub entries of a top entry are sent at once with
-the top entry.
+Note that the container streamer split the stream by the top level entries. All the sub entries of a top entry are expected to be
+sent as a whole, therefore the memory is determined by the largest entry at top level.
 
 ### Object Retriever
-
-`ObjectRetriever` is designed to request an object to be streamed from a remote site. It automatically sets up the streaming
+Building upon the streamers, `ObjectRetriever` is designed for easier integration with existing code: to request an object to be streamed from a remote site. It automatically sets up the streaming
 on both ends and handles the coordination.
 
-Currently, following implementations are available,
-
-* `FileRetriever`: It's used to retrieve a file from remote site using FileStreamer.
-* `ContainerRetriever`: This class can be used to retrieve a container from remote site using ContainerStreamer.
+Similarly, the following implementations are available,
 
-To use ContainerRetriever, the container must be given a name and added on the sending site,
+* `ContainerRetriever`: This class is used to retrieve a container from remote site using `ContainerStreamer`.
+* `FileRetriever`: This class is used to retrieve a file from remote site using `FileStreamer`.
 
+Note that to use ContainerRetriever, the container must be given a name and added on the sending site,
 ```
 ContainerRetriever.add_container("model", model_dict)
 ```
 
-## Example Jobs
+## Simple Examples
+First, we demonstrate how to use the Streamer directly without Retriever:
+```commandline
+python simple_file_streaming_job.py
+```
+Note that in this example, the file streaming is relatively "standalone", as the `FileReceiver` and `FileSender`
+are used directly as components, and no training workflow is used - as executor is required by NVFlare, here we used 
+a dummy executor.
+
+Although the file streaming is simple, it is not very practical for real-world applications, because 
+in most cases, rather than standalone, we need to send an object when it is generated at certain point in the workflow. In such cases, 
+Retriever is more convenient to use:
+```commandline
+python simple_dict_streaming_job.py
+```
+In this second example, the `ContainerRetriever` is setup in both server and client, and will automatically handle the streaming.
+It couples closely with the workflow, and is easier to define what to send and where to retrieve.
+
+## Full-scale Examples and Comparisons
+The above two simple examples illustrated the basic usage of streaming with random small messages. In the following, 
+we will demonstrate how to use the streamer with Retriever in a workflow with real large language model object, 
+and compare the memory usage with and without streaming. To track the memory usage, we use a simple script `utils/log_memory.sh`. 
+Note that the tracked usage is not fully accurate, but it is sufficient to give us a rough idea.
+
+All three settings: regular, container streaming, and file streaming, are integrated in the same script to avoid extra variabilities.
+To run the examples:
+```commandline
+bash regular_transmission.sh
+```
+```commandline
+bash container_stream.sh
+```
+```commandline
+bash file_stream.sh
+```
 
-### file_streaming job
+We then examine the memory usage by comparing the peak memory usage of the three settings. The results are shown below,
+note that the numbers here are the results of one experiment on one machine, and can be highly variable depending on the system and the environment.
+
+| Setting | Peak Memory Usage (MB) | Job Finishing Time (s) |
+| --- | --- | --- |
+| Regular Transmission | 42,427 | 47
+| Container Streaming | 23,265 | 50
+| File Streaming | 19,176 | 170
+
+As shown, the memory usage is significantly reduced by using streaming, especially for file streaming, 
+while file streaming takes much longer time to finish the job.
 
-This job uses the FileStreamer object to send a large file from server to client. 
 
-It demonstrates following mechanisms:
-1. It uses components to handle the file transferring. No training workflow is used. 
-   Since executor is required by NVFlare, a dummy executor is created.
-2. It shows how to use the streamer directly without an object retriever.
 
-The job creates a temporary file to test. You can run the job in POC or using simulator as follows,
 
-```
-nvflare simulator -n 1 -t 1 jobs/file_streaming
-```
-### dict_streaming job
 
-This job demonstrate how to send a dict from server to client using object retriever.
 
-It creates a task called "retrieve_dict" to tell client to get ready for the streaming.
 
-The example can be run in simulator like this,
-```
-nvflare simulator -n 1 -t 1 jobs/dict_streaming
-```
diff --git a/examples/advanced/streaming/container_stream.sh b/examples/advanced/streaming/container_stream.sh
@@ -0,0 +1,3 @@
+pkill -9 python
+bash utils/log_memory.sh >>/tmp/nvflare/workspace/container.txt &
+python streaming_job.py --retriever_mode container
diff --git a/examples/advanced/streaming/file_stream.sh b/examples/advanced/streaming/file_stream.sh
@@ -0,0 +1,3 @@
+pkill -9 python
+bash utils/log_memory.sh >>/tmp/nvflare/workspace/file.txt &
+python streaming_job.py --retriever_mode file
diff --git a/examples/advanced/streaming/jobs/dict_streaming/app/config/config_fed_client.json b/examples/advanced/streaming/jobs/dict_streaming/app/config/config_fed_client.json
diff --git a/examples/advanced/streaming/jobs/dict_streaming/app/config/config_fed_server.json b/examples/advanced/streaming/jobs/dict_streaming/app/config/config_fed_server.json
diff --git a/examples/advanced/streaming/jobs/dict_streaming/app/custom/__init__.py b/examples/advanced/streaming/jobs/dict_streaming/app/custom/__init__.py
diff --git a/examples/advanced/streaming/jobs/dict_streaming/meta.json b/examples/advanced/streaming/jobs/dict_streaming/meta.json
diff --git a/examples/advanced/streaming/jobs/file_streaming/app/config/config_fed_client.json b/examples/advanced/streaming/jobs/file_streaming/app/config/config_fed_client.json
diff --git a/examples/advanced/streaming/jobs/file_streaming/app/config/config_fed_server.json b/examples/advanced/streaming/jobs/file_streaming/app/config/config_fed_server.json
diff --git a/examples/advanced/streaming/jobs/file_streaming/app/custom/__init__.py b/examples/advanced/streaming/jobs/file_streaming/app/custom/__init__.py
diff --git a/examples/advanced/streaming/jobs/file_streaming/meta.json b/examples/advanced/streaming/jobs/file_streaming/meta.json
diff --git a/examples/advanced/streaming/regular_transmission.sh b/examples/advanced/streaming/regular_transmission.sh
@@ -0,0 +1,4 @@
+pkill -9 python
+mkdir /tmp/nvflare/workspace/
+bash utils/log_memory.sh >>/tmp/nvflare/workspace/regular.txt &
+python streaming_job.py
diff --git a/examples/advanced/streaming/simple_dict_streaming_job.py b/examples/advanced/streaming/simple_dict_streaming_job.py
@@ -0,0 +1,53 @@
+# Copyright (c) 2024, NVIDIA CORPORATION.  All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from src.simple_streaming_controller import SimpleStreamingController
+from src.simple_streaming_executor import SimpleStreamingExecutor
+
+from nvflare import FedJob
+from nvflare.app_common.streamers.container_retriever import ContainerRetriever
+
+
+def main():
+    # Create the FedJob
+    job = FedJob(name="simple_dict_streaming", min_clients=1)
+
+    # Define dict_retriever component and send to both server and clients
+    dict_retriever = ContainerRetriever()
+    job.to_server(dict_retriever, id="dict_retriever")
+    job.to_clients(dict_retriever, id="dict_retriever")
+
+    # Define the controller workflow and send to server
+    controller = SimpleStreamingController(dict_retriever_id="dict_retriever")
+    job.to_server(controller)
+
+    # Define the executor and send to clients
+    executor = SimpleStreamingExecutor(dict_retriever_id="dict_retriever")
+    job.to_clients(executor, tasks=["*"])
+
+    # Export the job
+    job_dir = "/tmp/nvflare/workspace/jobs/simple_dict_streaming"
+    print("job_dir=", job_dir)
+    job.export_job(job_dir)
+
+    # Run the job
+    work_dir = "/tmp/nvflare/workspace/works/simple_dict_streaming"
+    print("workspace_dir=", work_dir)
+
+    # starting the monitoring
+    job.simulator_run(work_dir, n_clients=1, threads=1)
+
+
+if __name__ == "__main__":
+    main()