Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix docs typos. #35465

Merged
merged 1 commit into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/en/fsdp.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ Otherwise, you can choose a size-based wrapping policy where FSDP is applied to

### Checkpointing

Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`]` method.
Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`] method.

```py
# directory containing checkpoints
Expand Down
2 changes: 1 addition & 1 deletion docs/source/zh/fsdp.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ FSDP 是通过包装网络中的每个层来应用的。通常,包装是以嵌

应该使用 `fsdp_state_dict_type: SHARDED_STATE_DICT` 来保存中间检查点,
因为在排名 0 上保存完整状态字典需要很长时间,通常会导致 `NCCL Timeout` 错误,因为在广播过程中会无限期挂起。
您可以使用 [`~accelerate.Accelerator.load_state`]` 方法加载分片状态字典以恢复训练。
您可以使用 [`~accelerate.Accelerator.load_state`] 方法加载分片状态字典以恢复训练。

```py
# 包含检查点的目录
Expand Down