-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What is the best way for reshaping a checkpoint trained with zero stage = 0 & fp16?
I see two options:
a) Continue training with zero stage 1 for 1 step & adapt this PR to work with fp16
b) Adapt the script here to work without the need of zero ckpts; The difficult part will just be reshaping the optimizer states in the mp_rank files
Maybe @tjruwase could give me a quick hint if a) or b) makes more sense before I waste my time? Thanks!
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working