-
Notifications
You must be signed in to change notification settings - Fork 179
Open
Description
Package version:
accelerate: 1.4.0
deepspeed: 0.16.4
pytorch:2.5.1+cu124
I use deepspeed to train lora by 7xL20, the config json is as follow:
{
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": 1,
"gradient_accumulation_steps": "auto",
"steps_per_print": 100,
"bf16": {
"enabled": true,
"loss_scale": 0
},
"zero_optimization": {
"stage": 3,
"contiguous_gradients": true,
"overlap_comm": true,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"reduce_scatter": true,
"reduce_bucket_size": 2e8
},
"activation_checkpointing": {
"partition_activations": true,
"cpu_checkpointing": false,
"contiguous_memory_optimization": true
},
"aio": {
"block_size": 1e6,
"queue_depth": 8,
"single_submit": false,
"overlap_events": true
}
}
and i add some code in original train_lora.py
os.environ["NCCL_P2P_DISABLE"] = "1"
os.environ["NCCL_IB_DISABLE"] = "1"
os.environ['NCCL_SOCKET_IFNAME'] = 'eth0' # Replace with your network interface
os.environ['NCCL_DEBUG'] = 'INFO'
My script to run is
"args": [
"launch",
"--mixed_precision=bf16",
"--num_processes=7",
"scripts/train_lora.py",
"--pretrained_model_name_or_path=/home/omadmin/yxd/EasyAnimateV5.1-12b-zh-InP",
"--train_data_meta=/home/omadmin/yxd/EasyAnimate/datasets/Minimalism/g4_filter.json",
"--config_path",
"config/easyanimate_video_v5.1_magvit_qwen.yaml",
"--image_sample_size=1024",
"--video_sample_size=512",
"--token_sample_size=512",
"--video_sample_stride=3",
"--video_sample_n_frames=49",
"--train_batch_size=1",
"--video_repeat=1",
"--gradient_accumulation_steps=1",
"--dataloader_num_workers=1",
"--num_train_epochs=100",
"--checkpointing_steps=100",
"--learning_rate=1e-05",
"--seed=42",
"--low_vram",
"--output_dir=output_dir",
"--gradient_checkpointing",
"--adam_weight_decay=5e-3",
"--adam_epsilon=1e-10",
"--vae_mini_batch=1",
"--max_grad_norm=0.05",
"--random_hw_adapt",
"--training_with_video_token_length",
"--train_mode=inpaint",
"--loss_type=flow",
"--rank=256",
"--network_alpha=128",
"--use_deepspeed",
"--random_flip",
"--motion_sub_loss",
"--enable_bucket",
"--random_ratio_crop",
"--enable_xformers_memory_efficient_attention"
],
Howevery it turns out to be
rank1]: exec(code, run_globals)
[rank1]: File "scripts/train_lora.py", line 2169, in <module>
[rank1]: main()
[rank1]: File "scripts/train_lora.py", line 1854, in main
[rank1]: encode_prompt(
[rank1]: File "scripts/train_lora.py", line 223, in encode_prompt
[rank1]: prompt_embeds = text_encoder(
[rank1]: File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]: return forward_call(*args, **kwargs)
[rank1]: File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/transformers/models/qwen2_vl/modeling_qwen2_vl.py", line 1683, in forward
[rank1]: inputs_embeds = self.model.embed_tokens(input_ids)
[rank1]: File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
[rank1]: return self._call_impl(*args, **kwargs)
[rank1]: File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
[rank1]: return forward_call(*args, **kwargs)
[rank1]: File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 190, in forward
[rank1]: return F.embedding(
[rank1]: File "/home/omadmin/miniconda3/envs/py310/lib/python3.10/site-packages/torch/nn/functional.py", line 2551, in embedding
[rank1]: return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
[rank1]: RuntimeError: 'weight' must be 2-D
Zero2 can work, but Zero3 cannot work.
Could you help me to fix this problem?
Metadata
Metadata
Assignees
Labels
No labels