Skip to content

Reshape ZeroStage=0 FP16 Checkpoint #2031

@Muennighoff

Description

@Muennighoff

What is the best way for reshaping a checkpoint trained with zero stage = 0 & fp16?

I see two options:
a) Continue training with zero stage 1 for 1 step & adapt this PR to work with fp16
b) Adapt the script here to work without the need of zero ckpts; The difficult part will just be reshaping the optimizer states in the mp_rank files

Maybe @tjruwase could give me a quick hint if a) or b) makes more sense before I waste my time? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions