🐞 Bugs in Distributed Loading of Non-Distributed Checkpoints and/or Model Creation via HF Wrapper
There are issues when loading non-distributed checkpoints or creating models via from_pretrained using the HF wrapper. (For details, see discussion below.)
- When
tensor_parallel is not 1: model creation succeeds, but checkpoint loading fails.
- When
sequence_parallel is not 1: model creation succeeds, but checkpoint loading fails.
- When
pipeline_parallel is not 1: model creation fails at different points, depending on whether sequence_tensor_parallel is set.