huggingface · LysandreJik · Jan 2, 2025 · Dec 31, 2024
diff --git a/docs/source/en/fsdp.md b/docs/source/en/fsdp.md
@@ -58,7 +58,7 @@ Otherwise, you can choose a size-based wrapping policy where FSDP is applied to
 
 ### Checkpointing
 
-Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`]` method.
+Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`] method.
 
 ```py
 # directory containing checkpoints

diff --git a/docs/source/zh/fsdp.md b/docs/source/zh/fsdp.md
@@ -74,7 +74,7 @@ FSDP 是通过包装网络中的每个层来应用的。通常，包装是以嵌
 
 应该使用 `fsdp_state_dict_type: SHARDED_STATE_DICT` 来保存中间检查点，
 因为在排名 0 上保存完整状态字典需要很长时间，通常会导致 `NCCL Timeout` 错误，因为在广播过程中会无限期挂起。
-您可以使用 [`~accelerate.Accelerator.load_state`]` 方法加载分片状态字典以恢复训练。
+您可以使用 [`~accelerate.Accelerator.load_state`] 方法加载分片状态字典以恢复训练。
 
 ```py
 # 包含检查点的目录