Skip to content

Commit

Permalink
[2.5] Remove hardcode heartbeat_timeout 0 (#3176)
Browse files Browse the repository at this point in the history
### Description

The original hardcode 0 has a problem, if the external user code has an
exception and the program will never return.
Our FL client job process (running LauncherExecutor) will never ends.
By using the default heartbeat_timeout value, if the FL client job
process does not receive the heartbeat from the user process for
heartbeat_timeout seconds, then we will consider it dead.

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.
  • Loading branch information
YuanTingHsieh authored Jan 23, 2025
1 parent 542b89a commit 215fd4d
Showing 1 changed file with 0 additions and 1 deletion.
1 change: 0 additions & 1 deletion nvflare/job_config/script_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,6 @@ def add_to_fed_job(self, job: FedJob, ctx, **kwargs):
launcher_id=launcher_id,
params_exchange_format=self._params_exchange_format,
params_transfer_type=self._params_transfer_type,
heartbeat_timeout=0,
)
)
job.add_executor(executor, tasks=tasks, ctx=ctx)
Expand Down

0 comments on commit 215fd4d

Please sign in to comment.