Skip to content

[Train] Support qwen dmd-lora training#1076

Open
Musisoul wants to merge 20 commits into
mainfrom
dmd_lora
Open

[Train] Support qwen dmd-lora training#1076
Musisoul wants to merge 20 commits into
mainfrom
dmd_lora

Conversation

@Musisoul
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the DMD-LoRA training method for the Qwen-Image model, adding a new configuration, a specialized DMDFlowMatchingScheduler, and the DmdLoraTrainer class. Feedback highlights that the DmdLoraTrainer currently ignores the gradient_accumulation_iters setting, which leads to incorrect training behavior and potential miscalculations in the fake model's learning rate scheduler. Additionally, the training progress display resets loss metrics every iteration, making it difficult to monitor running averages.

Comment thread lightx2v_train/lightx2v_train/trainers/dmd_lora.py
Comment on lines +226 to +232
progress.set_postfix(
dmd=running_dmd,
fake=running_fake,
lr=self.lr_scheduler.get_last_lr()[0],
)
running_dmd = 0.0
running_fake = 0.0
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The running_dmd and running_fake loss metrics are reset to 0.0 at the end of every iteration. This means the progress bar's postfix will only display the loss for the single most recent batch, rather than a running average or an accumulation over multiple steps. This makes monitoring the training progress difficult as the displayed values will be highly volatile.

Comment thread lightx2v_train/lightx2v_train/trainers/dmd_lora.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant