🐞 Bugs in Distributed Loading of Non-Distributed Checkpoints and/or Model Creation via HF Wrapper
There are issues when loading non-distributed checkpoints or creating models via from_pretrained
using the HF wrapper. (For details, see discussion below.)
- When
tensor_parallel
is not 1: model creation succeeds, but checkpoint loading fails.
- When
sequence_parallel
is not 1: model creation succeeds, but checkpoint loading fails.
- When
pipeline_parallel
is not 1: model creation fails at different points, depending on whether sequence_tensor_parallel
is set.