Skip to content

MoE router health analysis #11

@chrisjmccormick

Description

@chrisjmccormick

Models were accidentally trained with MoE instead of dense layers. Need to verify the MoE routers are functioning properly.

Background:

  • Intended to use all dense layers
  • Config change was lost, models trained with 1 initial dense + MoE layers
  • Training seemed to work fine

Tasks:

  • Analyze router load balancing across experts
  • Check for dead or underutilized experts
  • Compare with intended dense layer performance
  • Document whether MoE should be kept or switched to dense
  • Update training configs to be explicit about MoE vs dense choice

Question: Did the accidental MoE help or hurt the results?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions