Bugs in Distributed Loading of Non-Distributed Checkpoints

### 🐞 Bugs in Distributed Loading of Non-Distributed Checkpoints and/or Model Creation via HF Wrapper

There are issues when loading non-distributed checkpoints or creating models via `from_pretrained` using the HF wrapper. (For details, see discussion below.)

- When `tensor_parallel` is not 1: model creation succeeds, but checkpoint loading fails.  
- When `sequence_parallel` is not 1: model creation succeeds, but checkpoint loading fails.  
- When `pipeline_parallel` is not 1: model creation fails at different points, depending on whether `sequence_tensor_parallel` is set.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bugs in Distributed Loading of Non-Distributed Checkpoints #244

🐞 Bugs in Distributed Loading of Non-Distributed Checkpoints and/or Model Creation via HF Wrapper

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bugs in Distributed Loading of Non-Distributed Checkpoints #244

Description

🐞 Bugs in Distributed Loading of Non-Distributed Checkpoints and/or Model Creation via HF Wrapper

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions