Seeking Help on Loss Behavior #6

guanidine · 2024-02-01T09:51:00Z

First of all, thank you for your project, it looks great! I have been trying to apply it to ViT just like V-MoE. During the training process, I observed some changes in the losses as shown in the graph below. I have a few questions and would like to seek your guidance on whether these situations are normal:

For the balance_loss, it briefly increases and then stabilizes around 5.0 without decreasing. How can I verify if the experts have achieved balance in this case?
The aux_loss, which is the sum of weighted_balance_loss and weighted_router_z_loss, seems to have a relatively small contribution to the overall loss. Although it is indeed decreasing, should I increase the values of the two coef in your code?
Is there a recommended batch_size for training MoE? I have noticed that different batch_size values yield different results. The batch_size mentioned in the ST-MoE paper is too large for individual users like me to refer to.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Seeking Help on Loss Behavior #6

Seeking Help on Loss Behavior #6

guanidine commented Feb 1, 2024

Seeking Help on Loss Behavior #6

Seeking Help on Loss Behavior #6

Comments

guanidine commented Feb 1, 2024