Train_lora failed in deepspeed zero 3

Package version:
```
accelerate: 1.4.0
deepspeed: 0.16.4
pytorch:2.5.1+cu124
```



I use deepspeed to train lora by 7xL20, the config json is as follow:
```
{
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": 1,
  "gradient_accumulation_steps": "auto",
  "steps_per_print": 100,
  
  "bf16": {
    "enabled": true,
    "loss_scale": 0
  },

  "zero_optimization": {
    "stage": 3,
    "contiguous_gradients": true,
    "overlap_comm": true,
    "offload_optimizer": {
      "device": "cpu",
      "pin_memory": true
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 2e8,
    "reduce_scatter": true,
    "reduce_bucket_size": 2e8
  },

  "activation_checkpointing": {
    "partition_activations": true,
    "cpu_checkpointing": false,  
    "contiguous_memory_optimization": true
  },

  "aio": {
    "block_size": 1e6,
    "queue_depth": 8,
    "single_submit": false,
    "overlap_events": true
  }
}
```
and i add some code in original train_lora.py
```
os.environ["NCCL_P2P_DISABLE"] = "1"
os.environ["NCCL_IB_DISABLE"] = "1"
os.environ['NCCL_SOCKET_IFNAME'] = 'eth0'  # Replace with your network interface
os.environ['NCCL_DEBUG'] = 'INFO'
```

My script to run is 
```
"args": [
                "launch",
                "--mixed_precision=bf16",
                "--num_processes=7",
                "scripts/train_lora.py",
                "--pretrained_model_name_or_path=/home/omadmin/yxd/EasyAnimateV5.1-12b-zh-InP",
                "--train_data_meta=/home/omadmin/yxd/EasyAnimate/datasets/Minimalism/g4_filter.json",
                "--config_path",
                "config/easyanimate_video_v5.1_magvit_qwen.yaml",    
                "--image_sample_size=1024",
                "--video_sample_size=512",
                "--token_sample_size=512",
                "--video_sample_stride=3",
                "--video_sample_n_frames=49",
                "--train_batch_size=1",
                "--video_repeat=1",
                "--gradient_accumulation_steps=1",
                "--dataloader_num_workers=1",
                "--num_train_epochs=100",
                "--checkpointing_steps=100",
                "--learning_rate=1e-05",
                "--seed=42",
                "--low_vram",
                "--output_dir=output_dir",
                "--gradient_checkpointing",
                "--adam_weight_decay=5e-3",
                "--adam_epsilon=1e-10",
                "--vae_mini_batch=1",
                "--max_grad_norm=0.05",
                "--random_hw_adapt",
                "--training_with_video_token_length",
                "--train_mode=inpaint",
                "--loss_type=flow",
                "--rank=256",
                "--network_alpha=128",
               "--use_deepspeed",
               "--random_flip",
               "--motion_sub_loss",
               "--enable_bucket",
               "--random_ratio_crop",
               "--enable_xformers_memory_efficient_attention"

            ],
```

Howevery it turns out to be
```
rank1]:     exec(code, run_globals)
[rank1]:   File "scripts/train_lora.py", line 2169, in <module>
[rank1]:     main()
[rank1]:   File "scripts/train_lora.py", line 1854, in main
[rank1]:     encode_prompt(
[rank1]:   File "scripts/train_lora.py", line 223, in encode_prompt
[rank1]:     prompt_embeds = text_encoder(
[rank1]:   File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
[rank1]:     inputs_embeds = self.model.embed_tokens(input_ids)
[rank1]:   File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]:     return self._call_impl(*args, **kwargs)
[rank1]:   File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]:     return forward_call(*args, **kwargs)
[rank1]:   File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 190, in forward
[rank1]:     return F.embedding(
[rank1]:   File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/functional.py", line 2551, in embedding
[rank1]:     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank1]: RuntimeError: 'weight' must be 2-D
```

Zero2 can work, but Zero3 cannot work.
Could you help me to fix this problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Train_lora failed in deepspeed zero 3 #210

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Train_lora failed in deepspeed zero 3 #210

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions