Releases: lucidrains/st-moe-pytorch
Releases · lucidrains/st-moe-pytorch
0.0.28
oops
0.0.27
chip away at edge cases
0.0.25
another micro optimization for communication
0.0.24
in split by rank function, cache the sizes so on backwards there is n…
0.0.23
start journeying into distributed mixture of experts implementation
0.0.22
add ability to use differentiable topk
0.0.21
allow for different thresholds between second and third expert
0.0.20
multiply gates by mask_flat twice, as in mesh tensorflow code for top…
0.0.19
better naming
0.0.18
generalize to top-n gating, parallelize as much as possible