-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About gating_top_n #3
Comments
@Heihaierr not here, as it was just to be faithful to the paper, which explored 2 and then a generalization of top-n (iirc) up to 3 and 4 i thought that top 1 didn't work that well? |
Yes, but the paper also explored top-1 routing and shows improvement. |
get it, thanks for quick reply |
If |
Hi, I notice there is experiment with
top_n=1
in the paper ofst-moe
. But inst_moe_pytorch.py
,assert top_n >= 2, 'must be 2 or more experts'
Can
top_n=1
work in this implementation?The text was updated successfully, but these errors were encountered: