-
Notifications
You must be signed in to change notification settings - Fork 4.7k
Add single parameter allgather optimization for zero3 #7661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: aeeeeeep <[email protected]>
f580558 to
77a51f7
Compare
3613b25 to
d55f736
Compare
d55f736 to
30814fa
Compare
|
@aeeeeeep thanks for this contribution. Are you able to share some data showing the benefits of this optimization? |
Thanks for your feedback! I’ll share detailed data within the next few days. |
efecf04 to
d6cd73d
Compare
Signed-off-by: aeeeeeep <[email protected]>
Signed-off-by: aeeeeeep <[email protected]>
Make it very clear that `TiledMLP`'s memory saving has a cost of recomputing forward. Signed-off-by: aeeeeeep <[email protected]>
…eepspeedai#7659) fixes deepspeedai#7650 adding a `value.dim()>0` check to prevent slicing of 0-dim tensors cc @sfc-gh-truwase Signed-off-by: Naveenraj Kamalakannan <[email protected]> Co-authored-by: Olatunji Ruwase <[email protected]> Signed-off-by: aeeeeeep <[email protected]>
Signed-off-by: aeeeeeep <[email protected]>
Signed-off-by: aeeeeeep <[email protected]>
52ae961 to
c943e0e
Compare
|
The optimization shows measurable benefits only on a specific accelerator (under NDA), where hardware/driver overhead for memory allocation is significantly higher. This extended allocation latency indirectly delays memory release on communication streams. While this PR's reduction of allocation operations is theoretically sound, the practical benefit appears hardware-dependent and isn't observable on NVIDIA platforms. |
Thanks for the explanation. Can you confirm there is no regression on NVIDIA platform? |
|
Confirmed no regression on NVIDIA platform from my tests a few weeks ago (both performance and accuracy). |
Uh oh!
There was an error while loading. Please reload this page.