[cfg,trainer] feat: (MOPD, 1/3): Multi-teacher config dict#5774
[cfg,trainer] feat: (MOPD, 1/3): Multi-teacher config dict#5774JacobHelwig wants to merge 9 commits intoverl-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the distillation configuration by moving resource pool settings, such as enable_resource_pool, n_gpus_per_node, and nnodes, from the individual teacher model level to the top-level distillation configuration. It also removes the num_workers parameter and introduces a teacher_models dictionary to support future multi-teacher distillation. Additionally, the changes include improved validation logic for teacher model initialization and resource allocation across the trainer and experimental loops. I have no feedback to provide as there were no review comments to evaluate.
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request refactors the distillation configuration to support multi-teacher setups by moving resource pool settings from the teacher model configuration to the top-level distillation configuration and introducing a teacher_models dictionary. The changes include updating the trainer logic, configuration files, and validation methods to accommodate these structural updates. I have no feedback to provide as there were no review comments.
|
Merged into #5834 |
What does this PR do?
Config changes for Multi-teacher OPD. Adds multiple teacher model configs to
DistillationConfig. Maintains single teacher support elsewhere.Design & Code Changes
Adds a teacher model config dict to the distillation config. We also maintain a
teacher_modelentry for single teacher OPD.For the multi-teacher training script, teacher model args will be specified as: