Releases: asappresearch/sru
Releases · asappresearch/sru
2.7.0-rc1 - Postpone CUDA initialization
Postponed the CUDA initialization to the instantiation of SRUCells in order to ensure that the process where it takes place is where we plan to run our model.
3.0.0.dev6
More layer norm options;
More info in _repr_()
GPU inference in Torchscript model; post layer norm
- Support GPU/CUDA inference in Torchscript model
- Support post layer norm
- Support custom init value for weight_c
- Add unit tests for GPU inference
- Add unit tests for backward()
- Add more unit tests for Torchscript
Support GPU/CUDA inference in torchscript model; fix an issue
Dev1:
- Support GPU/CUDA inference in torchscript model
- Support post layer norm
- Support custom init value for weight_c
Dev2: - Fix an issue
Support GPU/CUDA inference in torchscript model
- Support GPU/CUDA inference in torchscript model
- Support post layer norm
- Support custom init value for weight_c
Support GPU/CUDA inference in torchscript model
- Support GPU/CUDA inference in torchscript model
- Support post layer norm
- Support custom init value for weight_c
v3.0.0.dev3
Fix a typo. Add an option to only use attention_last_n_layers
. Replace option normalize_after
with normalization_type
v3.0.0.dev2 Bug fixes
Changes:
- change
weight_c_init
fromOptional[float] = None
tofloat = 1.0
Bug fixes:
- fix a potential memory leak in custom op
- fix bug in cuda maskpad
- torchscript compatible in torch 1.5.1 now
Full fp16 training; SRUpp release
Note that future 3.0.0 release, and future 3.0.0 dev releases might not be backwards compatible with this dev release.
Key features / changes:
- #160: SRU++ is now available. Unit tests are included for torchscript compatibility and correctness. Example language model training code is available.
- #166: fp16 training improvement. The recurrence kernel will run in float16 now when amp is enabled. This gives an additional ~10% speedup on tested language model training, ~20% reduction on GPU memory usage and no regression on final results.
- #167: Code clean-up. No
autocast
block needed insru.ops.elementwise_recurrence_gpu
. This would allow both Native AMP and APEX AMP to work. (Credit: @visionscaper)
Other changes:
Dev release of v3
Note that future release and dev releases of v3 might be backwards incompatible with this dev release.
This dev release:
custom_m
renamed totransform_module
transform_module
always used now (theweight
andweight_proj
parameters have been removed)projection_size
can take in a sequence of projection_sizes, one per layern_proj
inSRUCell
renamed toprojection_size
, for consistency