Skip to content

Releases: OpenNMT/CTranslate2

CTranslate2 2.12.0

01 Feb 17:18
Compare
Choose a tag to compare

New features

  • Support models using additional source features (a.k.a. factors)

Fixes and improvements

  • Fix compilation with CUDA < 11.2
  • Fix incorrect revision number reported in the error message for unsupported model revisions
  • Improve quantization correctness by rounding the value instead of truncating (this change will only apply to newly converted models)
  • Improve default value of intra_threads when the system has less than 4 logical cores
  • Update oneDNN to 2.5.2

CTranslate2 2.11.0

11 Jan 12:44
Compare
Choose a tag to compare

Changes

  • With CUDA >= 11.2, the environment variable CT2_CUDA_ALLOCATOR now defaults to cuda_malloc_async which should improve performance on GPU.

New features

  • Build Python wheels for AArch64 Linux

Fixes and improvements

  • Improve performance of Gather CUDA kernel by using vectorized copy
  • Update Intel oneAPI to 2022.1
  • Update oneDNN to 2.5.1
  • Log some additional information with CT2_VERBOSE >= 1:
    • Location and compute type of loaded models
    • Version of the dynamically loaded cuBLAS library
    • Selected CUDA memory allocator

CTranslate2 2.10.1

15 Dec 17:36
Compare
Choose a tag to compare

Fixes and improvements

  • Fix stuck execution when loading a model on a second GPU
  • Fix numerical error in INT8 quantization on macOS

CTranslate2 2.10.0

13 Dec 13:55
Compare
Choose a tag to compare

Changes

  • inter_threads now also applies to GPU translation, where each translation thread is using a different CUDA stream to allow some parts of the GPU execution to overlap

New features

  • Add option disable_unk to disable the generation of unknown tokens
  • Add function set_random_seed to fix the seed in random sampling
  • [C++] Add constructors in Translator and TranslatorPool classes with ModelReader parameter

Fixes and improvements

  • Fix incorrect output from the Multinomial op when running on GPU with a small batch size
  • Fix Thrust and CUB headers that were included from the CUDA installation instead of the submodule
  • Fix static library compilation with the default build options (cmake -DBUILD_SHARED_LIBS=OFF)
  • Compile the Docker image and the Linux Python wheels with SSE 4.1 (vectorized kernels are still compiled for AVX and AVX2 with automatic dispatch, but other source files are now compiled with SSE 4.1)
  • Enable /fp:fast for MSVC to mirror -ffast-math that is enabled for GCC and Clang
  • Statically link against oneDNN to reduce the size of published binaries:
    • Linux Python wheels: 43MB -> 17MB
    • Windows Python wheels: 41MB -> 11MB
    • Docker image: 733MB -> 600MB

CTranslate2 2.9.0

01 Dec 15:50
Compare
Choose a tag to compare

New features

  • Add GPU support to the Windows Python wheels
  • Support OpenNMT-py and Fairseq options --alignment_layer and --alignment_heads which specify how the multi-head attention is reduced and returned by the Transformer decoder
  • Support dynamic loading of CUDA libraries on Windows

Fixes and improvements

  • Fix division by zero when normalizing the score of an empty target
  • Fix error that was not raised when the input length is greater than the number of position encodings
  • Improve performance of random sampling on GPU for large values of sampling_topk or when sampling over the full vocabulary
  • Include transformer_align and transformer_wmt_en_de_big_align in the list of supported Fairseq architectures
  • Add a CUDA kernel to prepare the length mask to avoid moving back to the CPU

CTranslate2 2.8.1

17 Nov 16:26
Compare
Choose a tag to compare

Fixes and improvements

  • Fix dtype error when reading float16 scores in greedy search
  • Fix usage of MSVC linker option /nodefaultlib that was not correctly passed to the linker

CTranslate2 2.8.0

15 Nov 09:55
Compare
Choose a tag to compare

Changes

  • The Linux Python wheels now use Intel OpenMP instead of GNU OpenMP for consistency with other published binaries

New features

  • Build Python wheels for Windows

Fixes and improvements

  • Fix segmentation fault when calling Translator.unload_model while an asynchronous translation is running
  • Fix implementation of repetition penalty that should be applied to all previously generated tokens and not just the tokens of the last step
  • Fix missing application of repetition penalty in greedy search
  • Fix incorrect token index when using a target prefix and a vocabulary mapping file
  • Set the OpenMP flag when compiling on Windows with -DOPENMP_RUNTIME=INTEL or -DOPENMP_RUNTIME=COMP

CTranslate2 2.7.0

04 Nov 15:59
Compare
Choose a tag to compare

Changes

  • Inputs are now truncated after 1024 tokens by default to limit the maximum memory usage (see translation option max_input_length)

New features

  • Add translation option max_input_length to limit the model input length
  • Add translation option repetition_penalty to apply an exponential penalty on repeated sequences
  • Add scoring option with_tokens_score to also output token-level scores when scoring a file

Fixes and improvements

  • Adapt the length penalty formula when using normalize_scores to match other implementations: the scores are divided by pow(length, length_penalty)
  • Implement LayerNorm with a single CUDA kernel instead of 2
  • Simplify the beam search implementation

CTranslate2 2.6.0

15 Oct 14:27
Compare
Choose a tag to compare

New features

  • Build wheels for Python 3.10
  • Accept passing the vocabulary as a opennmt.data.Vocab object or a list of tokens in the OpenNMT-tf converter

Fixes and improvements

  • Fix segmentation fault in greedy search when normalize_scores is enabled but not return_scores
  • Fix segmentation fault when min_decoding_length and max_decoding_length are both set to 0
  • Fix segmentation fault when sampling_topk is larger than the vocabulary size
  • Fix incorrect score normalization in greedy search when max_decoding_length is reached
  • Fix incorrect score normalization in the return_alternatives translation mode
  • Improve error checking when reading the binary model file
  • Apply LogSoftMax in-place during decoding and scoring

CTranslate2 2.5.1

04 Oct 16:34
Compare
Choose a tag to compare

Fixes and improvements

  • Fix logic error in the in-place implementation of the Gather op that could lead to incorrect beam search outputs