Skip to content

CTranslate2 1.16.1

Compare
Choose a tag to compare
@guillaumekln guillaumekln released this 23 Nov 11:52
· 1094 commits to master since this release

Fixes and improvements

  • Fuse dequantization and bias addition on GPU for improved INT8 performance
  • Improve performance of masked softmax on GPU
  • Fix error when building the CentOS 7 GPU Docker image
  • The previous version listed "Pad size of INT8 matrices to a multiple of 16 when the GPU has INT8 Tensor Cores". However, the padding was not applied due to a bug and fixing it degraded the performance, so this behavior is not implemented for now.