You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* fix diloco integration test
Summary:
- for diloco the model parameters, in the way they are saved by the test can be different across replicas
- only the global parameters can be the same
- fix the test to validate the global parameters are the same instead of the local model parameters
Test Plan:
```
$ pytest -v ./torchft/local_sgd_integ_test.py::LocalSGDIntegTest::test_diloco_recovery_0
```
* avoid stream synchronization in manager
Summary:
- use a recovery event to synchronize on instead of the recovery stream
- fix calling `work.wait()` in manager
- avoid calling `quorum.wait` inside of a callback
Test Plan:
<img width="1483" alt="Screenshot 2025-06-16 at 1 08 04 AM" src="https://github.com/user-attachments/assets/47de0853-1878-40b9-ae8e-f9fc55972917" />
0 commit comments