Language modeling examples do not show how to do multi-gpu training / fine-tuning

### System Info

- `transformers` version: 4.41.2
- Platform: Linux-5.15.0-1042-nvidia-x86_64-with-glibc2.35
- Python version: 3.9.18
- Huggingface_hub version: 0.23.3
- Safetensors version: 0.4.2
- Accelerate version: 0.31.0
- Accelerate config: 	not found
- PyTorch version (GPU?): 2.2.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed


### Who can help?

@muellerz @stevhliu 

### Information

- [X] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [X] My own task or dataset (give details below)

### Reproduction

n/a

### Expected behavior

The `run_clm.py` and other related scripts in:

`https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling`

notionally support training / fine-tuning of models whose gradients are too large to fit on a single GPU, if you believe their CLI.  However there is no example showing how to actually do that.

For instance, `accelerate estimate-memory` says training the Mistral-7B family with Adam takes roughly 55 GB with float16, which is more memory than a single 40GB A100 has.  So I'd need to use more than one GPU.

Would it be possible to modify the language_modeling documentation to explain how to do that?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Language modeling examples do not show how to do multi-gpu training / fine-tuning #31323

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Language modeling examples do not show how to do multi-gpu training / fine-tuning #31323

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions