You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried using this code on my project and I found that the input that goes to the MoE module (x in the forward function of the MoE class) and the input that goes to the first expert (expert_input in the Expert class) are not the same. I thought that since the first expert is always used on all tokens, the input should be the same. Is my assumption wrong?
Second, I noticed that the input dimensions are different. In the forward function of the MoE below, the input is transformed into (b, e, c, d) dimensions
expert_inputs=einsum('b n d, b n e c -> b e c d', x, dispatch_tensor)
# feed the expert inputs through the experts.
expert_outputs=self.experts(expert_inputs)
but in the Experts class, it seems the expected dimension is (b, e, n, d). If I understand correctly, c is expert capacity, n is the sequence length, and they are not the same. Could you please also enlighten me on this?
Thank you very much for your help!
The text was updated successfully, but these errors were encountered:
Hi, thanks a lot for your great work!
I tried using this code on my project and I found that the input that goes to the MoE module (
x
in the forward function of the MoE class) and the input that goes to the first expert (expert_input
in the Expert class) are not the same. I thought that since the first expert is always used on all tokens, the input should be the same. Is my assumption wrong?Second, I noticed that the input dimensions are different. In the forward function of the MoE below, the input is transformed into (b, e, c, d) dimensions
st-moe-pytorch/st_moe_pytorch/st_moe_pytorch.py
Lines 609 to 613 in 6b7f7fb
but in the Experts class, it seems the expected dimension is (b, e, n, d). If I understand correctly, c is expert capacity, n is the sequence length, and they are not the same. Could you please also enlighten me on this?
Thank you very much for your help!
The text was updated successfully, but these errors were encountered: