-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Models were accidentally trained with MoE instead of dense layers. Need to verify the MoE routers are functioning properly.
Background:
- Intended to use all dense layers
- Config change was lost, models trained with 1 initial dense + MoE layers
- Training seemed to work fine
Tasks:
- Analyze router load balancing across experts
- Check for dead or underutilized experts
- Compare with intended dense layer performance
- Document whether MoE should be kept or switched to dense
- Update training configs to be explicit about MoE vs dense choice
Question: Did the accidental MoE help or hurt the results?
Metadata
Metadata
Assignees
Labels
No labels