Skip to content

Full fp16 training; SRUpp release

Pre-release
Pre-release
Compare
Choose a tag to compare
@taoleicn taoleicn released this 05 Mar 17:33
· 49 commits to 3.0.0-dev since this release
81a657b

Note that future 3.0.0 release, and future 3.0.0 dev releases might not be backwards compatible with this dev release.

Key features / changes:

  • #160: SRU++ is now available. Unit tests are included for torchscript compatibility and correctness. Example language model training code is available.
  • #166: fp16 training improvement. The recurrence kernel will run in float16 now when amp is enabled. This gives an additional ~10% speedup on tested language model training, ~20% reduction on GPU memory usage and no regression on final results.
  • #167: Code clean-up. No autocast block needed in sru.ops.elementwise_recurrence_gpu. This would allow both Native AMP and APEX AMP to work. (Credit: @visionscaper)

Other changes:

  • Fix an dtype error within adaptive embedding (#168)
  • Significant speed-up on BILLONWORD training (#169)
  • LICENCE update requested by IPC (#165)