Skip to content

Language modeling examples do not show how to do multi-gpu training / fine-tuning #31323

Closed
@csiefer2

Description

@csiefer2

System Info

  • transformers version: 4.41.2
  • Platform: Linux-5.15.0-1042-nvidia-x86_64-with-glibc2.35
  • Python version: 3.9.18
  • Huggingface_hub version: 0.23.3
  • Safetensors version: 0.4.2
  • Accelerate version: 0.31.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.2.1+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed

Who can help?

@muellerz @stevhliu

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

n/a

Expected behavior

The run_clm.py and other related scripts in:

https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling

notionally support training / fine-tuning of models whose gradients are too large to fit on a single GPU, if you believe their CLI. However there is no example showing how to actually do that.

For instance, accelerate estimate-memory says training the Mistral-7B family with Adam takes roughly 55 GB with float16, which is more memory than a single 40GB A100 has. So I'd need to use more than one GPU.

Would it be possible to modify the language_modeling documentation to explain how to do that?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions