CTranslate2 1.16.1

guillaumekln released this 23 Nov 11:52

· 1094 commits to master since this release

129047e

Fixes and improvements

Fuse dequantization and bias addition on GPU for improved INT8 performance
Improve performance of masked softmax on GPU
Fix error when building the CentOS 7 GPU Docker image
The previous version listed "Pad size of INT8 matrices to a multiple of 16 when the GPU has INT8 Tensor Cores". However, the padding was not applied due to a bug and fixing it degraded the performance, so this behavior is not implemented for now.

Assets 2