Full fp16 training; SRUpp release
Pre-release
Pre-release
·
49 commits
to 3.0.0-dev
since this release
Note that future 3.0.0 release, and future 3.0.0 dev releases might not be backwards compatible with this dev release.
Key features / changes:
- #160: SRU++ is now available. Unit tests are included for torchscript compatibility and correctness. Example language model training code is available.
- #166: fp16 training improvement. The recurrence kernel will run in float16 now when amp is enabled. This gives an additional ~10% speedup on tested language model training, ~20% reduction on GPU memory usage and no regression on final results.
- #167: Code clean-up. No
autocast
block needed insru.ops.elementwise_recurrence_gpu
. This would allow both Native AMP and APEX AMP to work. (Credit: @visionscaper)