Skip to content

Releases: OpenNMT/CTranslate2

CTranslate2 1.19.0

31 Mar 13:33
486d49b
Compare
Choose a tag to compare

Changes

  • Rename CMake option WITH_TESTS to BUILD_TESTS

New features

  • Add "auto" compute type to automatically select the fastest compute type on the current system

Fixes and improvements

  • [Python] Clear memory allocator cache when calling unload_model
  • [Python] Make methods unload_model and load_model thread safe
  • Fix conversion of TensorFlow SavedModel with shared embeddings
  • Update Intel oneAPI to 2021.2
  • Compile core library with C++14 standard

CTranslate2 1.18.3

02 Mar 08:50
Compare
Choose a tag to compare

Fixes and improvements

  • Use Intel OpenMP instead of GNU OpenMP in the Docker images as a workaround for issue #409

CTranslate2 1.18.2

23 Feb 17:38
Compare
Choose a tag to compare

Fixes and improvements

  • Fix crash when enabling coverage penalty in GPU translation
  • Fix incorrect value of AVX2 flag in CT2_VERBOSE output

CTranslate2 1.18.1

01 Feb 15:09
Compare
Choose a tag to compare

Fixes and improvements

  • Fix conversion of models setting the attributes with_source_bos or with_source_eos

CTranslate2 1.18.0

28 Jan 11:38
3e9f226
Compare
Choose a tag to compare

Changes

  • Some options default value in the translate client have been changed to match the Python API:
    • batch_size = 32 (instead of 30)
    • beam_size = 2 (instead of 5)
    • intra_threads = 4 (instead of 0)

New features

  • Support multi-GPU translation: device_index argument can now be set to a list of GPU IDs (see example)

Fixes and improvements

  • Improve performance when using multiple GPU translators concurrently in the same process
  • [Python] Do nothing when calling unload_model(to_cpu=True) on CPU translators
  • [Python] Set a default value for max_batch_size argument in method Translator.translate_file
  • Disable CT2_TRANSLATORS_CORE_OFFSET in OpenMP builds as setting thread affinity does not work when OpenMP is enabled

CTranslate2 1.17.1

15 Jan 13:50
Compare
Choose a tag to compare

Fixes and improvements

  • Fix Python wheel loading error on macOS

CTranslate2 1.17.0

11 Jan 17:02
d96a509
Compare
Choose a tag to compare

Changes

  • Linux Python wheels are now compiled under manylinux2014 and require pip version >= 19.3

New features

  • Publish Python wheels for macOS (CPU only)
  • Support compilation for ARM 64-bit architecture and add NEON vectorization
  • Add new optional GEMM backends: Apple Accelerate and OpenBLAS
  • Add replace_unknowns translation option to replace unknown target tokens by source tokens with the highest attention
  • Add flags in the model specification to declare that BOS and/or EOS tokens should be added to the source sequences

Fixes and improvements

  • Fix segmentation fault when the model is converted with a wrong vocabulary and predicts an out-of-vocabulary index
  • Fix result of vectorized array reduction when the array length is not a multiple of the SIMD registers width
  • Fix exit code when running cli/translate -h
  • Improve performance of vectorized vector math by inlining calls to intrinsics functions
  • Improve accuracy of LogSoftMax CUDA implementation
  • Improve error message when --model option is not set in cli/translate
  • Update oneMKL to 2020.1 in published binaries
  • Update oneDNN to 2.0 in published binaries
  • Update default search paths to support compilation with oneMKL and oneDNN installed from the oneAPI toolkit

CTranslate2 1.16.2

27 Nov 10:31
Compare
Choose a tag to compare

Fixes and improvements

  • Fix cuBLAS version included in the Python wheels published to PyPI. The included library was targeting CUDA 10.2 instead of CUDA 10.1.
  • Re-add Python 3.5 wheels on PyPI to give users more time to transition

CTranslate2 1.16.1

23 Nov 11:52
Compare
Choose a tag to compare

Fixes and improvements

  • Fuse dequantization and bias addition on GPU for improved INT8 performance
  • Improve performance of masked softmax on GPU
  • Fix error when building the CentOS 7 GPU Docker image
  • The previous version listed "Pad size of INT8 matrices to a multiple of 16 when the GPU has INT8 Tensor Cores". However, the padding was not applied due to a bug and fixing it degraded the performance, so this behavior is not implemented for now.

CTranslate2 1.16.0

18 Nov 16:13
Compare
Choose a tag to compare

Changes

  • Drop support for Python 2.7 and 3.5

New features

  • Add Docker images using CUDA 11.0

Fixes and improvements

  • [Python] Enable parallel CPU translations from translate_batch when setting inter_threads > 1 and max_batch_size > 0
  • Improve GPU performance on Turing architecture when using a Docker image or the Python package
  • Pad size of INT8 matrices to a multiple of 16 when the GPU has INT8 Tensor Cores
  • Add information about detected GPU devices in CT2_VERBOSE output
  • Update oneDNN to 1.7
  • [Python] Improve type checking for some arguments