Releases: OpenNMT/CTranslate2
Releases Β· OpenNMT/CTranslate2
CTranslate2 2.12.0
New features
- Support models using additional source features (a.k.a. factors)
Fixes and improvements
- Fix compilation with CUDA < 11.2
- Fix incorrect revision number reported in the error message for unsupported model revisions
- Improve quantization correctness by rounding the value instead of truncating (this change will only apply to newly converted models)
- Improve default value of
intra_threads
when the system has less than 4 logical cores - Update oneDNN to 2.5.2
CTranslate2 2.11.0
Changes
- With CUDA >= 11.2, the environment variable
CT2_CUDA_ALLOCATOR
now defaults tocuda_malloc_async
which should improve performance on GPU.
New features
- Build Python wheels for AArch64 Linux
Fixes and improvements
- Improve performance of Gather CUDA kernel by using vectorized copy
- Update Intel oneAPI to 2022.1
- Update oneDNN to 2.5.1
- Log some additional information with
CT2_VERBOSE
>= 1:- Location and compute type of loaded models
- Version of the dynamically loaded cuBLAS library
- Selected CUDA memory allocator
CTranslate2 2.10.1
Fixes and improvements
- Fix stuck execution when loading a model on a second GPU
- Fix numerical error in INT8 quantization on macOS
CTranslate2 2.10.0
Changes
inter_threads
now also applies to GPU translation, where each translation thread is using a different CUDA stream to allow some parts of the GPU execution to overlap
New features
- Add option
disable_unk
to disable the generation of unknown tokens - Add function
set_random_seed
to fix the seed in random sampling - [C++] Add constructors in
Translator
andTranslatorPool
classes withModelReader
parameter
Fixes and improvements
- Fix incorrect output from the Multinomial op when running on GPU with a small batch size
- Fix Thrust and CUB headers that were included from the CUDA installation instead of the submodule
- Fix static library compilation with the default build options (
cmake -DBUILD_SHARED_LIBS=OFF
) - Compile the Docker image and the Linux Python wheels with SSE 4.1 (vectorized kernels are still compiled for AVX and AVX2 with automatic dispatch, but other source files are now compiled with SSE 4.1)
- Enable
/fp:fast
for MSVC to mirror-ffast-math
that is enabled for GCC and Clang - Statically link against oneDNN to reduce the size of published binaries:
- Linux Python wheels: 43MB -> 17MB
- Windows Python wheels: 41MB -> 11MB
- Docker image: 733MB -> 600MB
CTranslate2 2.9.0
New features
- Add GPU support to the Windows Python wheels
- Support OpenNMT-py and Fairseq options
--alignment_layer
and--alignment_heads
which specify how the multi-head attention is reduced and returned by the Transformer decoder - Support dynamic loading of CUDA libraries on Windows
Fixes and improvements
- Fix division by zero when normalizing the score of an empty target
- Fix error that was not raised when the input length is greater than the number of position encodings
- Improve performance of random sampling on GPU for large values of
sampling_topk
or when sampling over the full vocabulary - Include
transformer_align
andtransformer_wmt_en_de_big_align
in the list of supported Fairseq architectures - Add a CUDA kernel to prepare the length mask to avoid moving back to the CPU
CTranslate2 2.8.1
Fixes and improvements
- Fix dtype error when reading float16 scores in greedy search
- Fix usage of MSVC linker option
/nodefaultlib
that was not correctly passed to the linker
CTranslate2 2.8.0
Changes
- The Linux Python wheels now use Intel OpenMP instead of GNU OpenMP for consistency with other published binaries
New features
- Build Python wheels for Windows
Fixes and improvements
- Fix segmentation fault when calling
Translator.unload_model
while an asynchronous translation is running - Fix implementation of repetition penalty that should be applied to all previously generated tokens and not just the tokens of the last step
- Fix missing application of repetition penalty in greedy search
- Fix incorrect token index when using a target prefix and a vocabulary mapping file
- Set the OpenMP flag when compiling on Windows with
-DOPENMP_RUNTIME=INTEL
or-DOPENMP_RUNTIME=COMP
CTranslate2 2.7.0
Changes
- Inputs are now truncated after 1024 tokens by default to limit the maximum memory usage (see translation option
max_input_length
)
New features
- Add translation option
max_input_length
to limit the model input length - Add translation option
repetition_penalty
to apply an exponential penalty on repeated sequences - Add scoring option
with_tokens_score
to also output token-level scores when scoring a file
Fixes and improvements
- Adapt the length penalty formula when using
normalize_scores
to match other implementations: the scores are divided bypow(length, length_penalty)
- Implement
LayerNorm
with a single CUDA kernel instead of 2 - Simplify the beam search implementation
CTranslate2 2.6.0
New features
- Build wheels for Python 3.10
- Accept passing the vocabulary as a
opennmt.data.Vocab
object or a list of tokens in the OpenNMT-tf converter
Fixes and improvements
- Fix segmentation fault in greedy search when
normalize_scores
is enabled but notreturn_scores
- Fix segmentation fault when
min_decoding_length
andmax_decoding_length
are both set to 0 - Fix segmentation fault when
sampling_topk
is larger than the vocabulary size - Fix incorrect score normalization in greedy search when
max_decoding_length
is reached - Fix incorrect score normalization in the
return_alternatives
translation mode - Improve error checking when reading the binary model file
- Apply
LogSoftMax
in-place during decoding and scoring
CTranslate2 2.5.1
Fixes and improvements
- Fix logic error in the in-place implementation of the
Gather
op that could lead to incorrect beam search outputs