Releases: OpenNMT/CTranslate2
Releases · OpenNMT/CTranslate2
CTranslate2 1.19.0
Changes
- Rename CMake option
WITH_TESTS
toBUILD_TESTS
New features
- Add "auto" compute type to automatically select the fastest compute type on the current system
Fixes and improvements
- [Python] Clear memory allocator cache when calling
unload_model
- [Python] Make methods
unload_model
andload_model
thread safe - Fix conversion of TensorFlow SavedModel with shared embeddings
- Update Intel oneAPI to 2021.2
- Compile core library with C++14 standard
CTranslate2 1.18.3
Fixes and improvements
- Use Intel OpenMP instead of GNU OpenMP in the Docker images as a workaround for issue #409
CTranslate2 1.18.2
Fixes and improvements
- Fix crash when enabling coverage penalty in GPU translation
- Fix incorrect value of AVX2 flag in
CT2_VERBOSE
output
CTranslate2 1.18.1
Fixes and improvements
- Fix conversion of models setting the attributes
with_source_bos
orwith_source_eos
CTranslate2 1.18.0
Changes
- Some options default value in the
translate
client have been changed to match the Python API:batch_size
= 32 (instead of 30)beam_size
= 2 (instead of 5)intra_threads
= 4 (instead of 0)
New features
- Support multi-GPU translation:
device_index
argument can now be set to a list of GPU IDs (see example)
Fixes and improvements
- Improve performance when using multiple GPU translators concurrently in the same process
- [Python] Do nothing when calling
unload_model(to_cpu=True)
on CPU translators - [Python] Set a default value for
max_batch_size
argument in methodTranslator.translate_file
- Disable
CT2_TRANSLATORS_CORE_OFFSET
in OpenMP builds as setting thread affinity does not work when OpenMP is enabled
CTranslate2 1.17.1
Fixes and improvements
- Fix Python wheel loading error on macOS
CTranslate2 1.17.0
Changes
- Linux Python wheels are now compiled under
manylinux2014
and requirepip
version >= 19.3
New features
- Publish Python wheels for macOS (CPU only)
- Support compilation for ARM 64-bit architecture and add NEON vectorization
- Add new optional GEMM backends: Apple Accelerate and OpenBLAS
- Add
replace_unknowns
translation option to replace unknown target tokens by source tokens with the highest attention - Add flags in the model specification to declare that BOS and/or EOS tokens should be added to the source sequences
Fixes and improvements
- Fix segmentation fault when the model is converted with a wrong vocabulary and predicts an out-of-vocabulary index
- Fix result of vectorized array reduction when the array length is not a multiple of the SIMD registers width
- Fix exit code when running
cli/translate -h
- Improve performance of vectorized vector math by inlining calls to intrinsics functions
- Improve accuracy of LogSoftMax CUDA implementation
- Improve error message when
--model
option is not set incli/translate
- Update oneMKL to 2020.1 in published binaries
- Update oneDNN to 2.0 in published binaries
- Update default search paths to support compilation with oneMKL and oneDNN installed from the oneAPI toolkit
CTranslate2 1.16.2
Fixes and improvements
- Fix cuBLAS version included in the Python wheels published to PyPI. The included library was targeting CUDA 10.2 instead of CUDA 10.1.
- Re-add Python 3.5 wheels on PyPI to give users more time to transition
CTranslate2 1.16.1
Fixes and improvements
- Fuse dequantization and bias addition on GPU for improved INT8 performance
- Improve performance of masked softmax on GPU
- Fix error when building the CentOS 7 GPU Docker image
- The previous version listed "Pad size of INT8 matrices to a multiple of 16 when the GPU has INT8 Tensor Cores". However, the padding was not applied due to a bug and fixing it degraded the performance, so this behavior is not implemented for now.
CTranslate2 1.16.0
Changes
- Drop support for Python 2.7 and 3.5
New features
- Add Docker images using CUDA 11.0
Fixes and improvements
- [Python] Enable parallel CPU translations from
translate_batch
when settinginter_threads
> 1 andmax_batch_size
> 0 - Improve GPU performance on Turing architecture when using a Docker image or the Python package
- Pad size of INT8 matrices to a multiple of 16 when the GPU has INT8 Tensor Cores
- Add information about detected GPU devices in
CT2_VERBOSE
output - Update oneDNN to 1.7
- [Python] Improve type checking for some arguments