Releases: lucidrains/st-moe-pytorch
Releases · lucidrains/st-moe-pytorch
0.0.6
remove dropout, as in the paper, they show it is unhelpful (and also …
0.0.5
when doing eval, turn off balance and router z loss calculations
0.0.3
init expert weights and biases
0.0.2
first pass for router z loss
0.0.1
start cleaning up, add the ff geglu based experts with multiplicative…