Releases · lucidrains/st-moe-pytorch

10 Sep 16:59

0.0.28

52b5c8a

0.0.28

oops

Assets 2

10 Sep 16:25

lucidrains

0.0.27

83d75b8

0.0.27

chip away at edge cases

Assets 2

10 Sep 15:05

lucidrains

0.0.25

5418873

0.0.25

another micro optimization for communication

Assets 2

10 Sep 14:50

lucidrains

0.0.24

666d2fd

0.0.24

in split by rank function, cache the sizes so on backwards there is n…

Assets 2

09 Sep 16:10

lucidrains

0.0.23

085d511

0.0.23

start journeying into distributed mixture of experts implementation

Assets 2

26 Aug 00:14

lucidrains

0.0.22

97a5688

0.0.22

add ability to use differentiable topk

Assets 2

21 Aug 22:46

lucidrains

0.0.21

22dfd4d

0.0.21

allow for different thresholds between second and third expert

Assets 2

21 Aug 22:09

lucidrains

0.0.20

f9b8ce3

0.0.20

multiply gates by mask_flat twice, as in mesh tensorflow code for top…

Assets 2

21 Aug 18:33

lucidrains

0.0.19

1ca8170

0.0.19

better naming

Assets 2

21 Aug 18:20

lucidrains

0.0.18

5ef273b

0.0.18

generalize to top-n gating, parallelize as much as possible

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: lucidrains/st-moe-pytorch

0.0.28

0.0.27

0.0.25

0.0.24

0.0.23

0.0.22

0.0.21

0.0.20

0.0.19

0.0.18