Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the DMD-LoRA training method for the Qwen-Image model, adding a new configuration, a specialized DMDFlowMatchingScheduler, and the DmdLoraTrainer class. Feedback highlights that the DmdLoraTrainer currently ignores the gradient_accumulation_iters setting, which leads to incorrect training behavior and potential miscalculations in the fake model's learning rate scheduler. Additionally, the training progress display resets loss metrics every iteration, making it difficult to monitor running averages.
| progress.set_postfix( | ||
| dmd=running_dmd, | ||
| fake=running_fake, | ||
| lr=self.lr_scheduler.get_last_lr()[0], | ||
| ) | ||
| running_dmd = 0.0 | ||
| running_fake = 0.0 |
There was a problem hiding this comment.
The running_dmd and running_fake loss metrics are reset to 0.0 at the end of every iteration. This means the progress bar's postfix will only display the loss for the single most recent batch, rather than a running average or an accumulation over multiple steps. This makes monitoring the training progress difficult as the displayed values will be highly volatile.
No description provided.