Release Full fp16 training; SRUpp release · asappresearch/sru

Note that future 3.0.0 release, and future 3.0.0 dev releases might not be backwards compatible with this dev release.

#160: SRU++ is now available. Unit tests are included for torchscript compatibility and correctness. Example language model training code is available.
#166: fp16 training improvement. The recurrence kernel will run in float16 now when amp is enabled. This gives an additional ~10% speedup on tested language model training, ~20% reduction on GPU memory usage and no regression on final results.
#167: Code clean-up. No autocast block needed in sru.ops.elementwise_recurrence_gpu. This would allow both Native AMP and APEX AMP to work. (Credit: @visionscaper)

Provide feedback