Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

OpenNMT / CTranslate2 Public

Notifications You must be signed in to change notification settings
Fork 319
Star 3.6k

Code
Issues 183
Pull requests 26
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Security
Insights

Releases: OpenNMT/CTranslate2

Releases · OpenNMT/CTranslate2

CTranslate2 2.12.0

01 Feb 17:18

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 2.12.0

New features

Support models using additional source features (a.k.a. factors)

Fixes and improvements

Fix compilation with CUDA < 11.2
Fix incorrect revision number reported in the error message for unsupported model revisions
Improve quantization correctness by rounding the value instead of truncating (this change will only apply to newly converted models)
Improve default value of intra_threads when the system has less than 4 logical cores
Update oneDNN to 2.5.2

Assets 2

Loading

All reactions

CTranslate2 2.11.0

11 Jan 12:44

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 2.11.0

Changes

With CUDA >= 11.2, the environment variable CT2_CUDA_ALLOCATOR now defaults to cuda_malloc_async which should improve performance on GPU.

New features

Build Python wheels for AArch64 Linux

Fixes and improvements

Improve performance of Gather CUDA kernel by using vectorized copy
Update Intel oneAPI to 2022.1
Update oneDNN to 2.5.1
Log some additional information with CT2_VERBOSE >= 1:
- Location and compute type of loaded models
- Version of the dynamically loaded cuBLAS library
- Selected CUDA memory allocator

Assets 2

Loading

All reactions

CTranslate2 2.10.1

15 Dec 17:36

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 2.10.1

Fixes and improvements

Fix stuck execution when loading a model on a second GPU
Fix numerical error in INT8 quantization on macOS

Assets 2

Loading

All reactions

CTranslate2 2.10.0

13 Dec 13:55

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 2.10.0

Changes

inter_threads now also applies to GPU translation, where each translation thread is using a different CUDA stream to allow some parts of the GPU execution to overlap

New features

Add option disable_unk to disable the generation of unknown tokens
Add function set_random_seed to fix the seed in random sampling
[C++] Add constructors in Translator and TranslatorPool classes with ModelReader parameter

Fixes and improvements

Fix incorrect output from the Multinomial op when running on GPU with a small batch size
Fix Thrust and CUB headers that were included from the CUDA installation instead of the submodule
Fix static library compilation with the default build options (cmake -DBUILD_SHARED_LIBS=OFF)
Compile the Docker image and the Linux Python wheels with SSE 4.1 (vectorized kernels are still compiled for AVX and AVX2 with automatic dispatch, but other source files are now compiled with SSE 4.1)
Enable /fp:fast for MSVC to mirror -ffast-math that is enabled for GCC and Clang
Statically link against oneDNN to reduce the size of published binaries:
- Linux Python wheels: 43MB -> 17MB
- Windows Python wheels: 41MB -> 11MB
- Docker image: 733MB -> 600MB

Assets 2

Loading

All reactions

CTranslate2 2.9.0

01 Dec 15:50

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 2.9.0

New features

Add GPU support to the Windows Python wheels
Support OpenNMT-py and Fairseq options --alignment_layer and --alignment_heads which specify how the multi-head attention is reduced and returned by the Transformer decoder
Support dynamic loading of CUDA libraries on Windows

Fixes and improvements

Fix division by zero when normalizing the score of an empty target
Fix error that was not raised when the input length is greater than the number of position encodings
Improve performance of random sampling on GPU for large values of sampling_topk or when sampling over the full vocabulary
Include transformer_align and transformer_wmt_en_de_big_align in the list of supported Fairseq architectures
Add a CUDA kernel to prepare the length mask to avoid moving back to the CPU

Assets 2

Loading

All reactions

CTranslate2 2.8.1

17 Nov 16:26

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 2.8.1

Fixes and improvements

Fix dtype error when reading float16 scores in greedy search
Fix usage of MSVC linker option /nodefaultlib that was not correctly passed to the linker

Assets 2

Loading

All reactions

CTranslate2 2.8.0

15 Nov 09:55

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 2.8.0

Changes

The Linux Python wheels now use Intel OpenMP instead of GNU OpenMP for consistency with other published binaries

New features

Build Python wheels for Windows

Fixes and improvements

Fix segmentation fault when calling Translator.unload_model while an asynchronous translation is running
Fix implementation of repetition penalty that should be applied to all previously generated tokens and not just the tokens of the last step
Fix missing application of repetition penalty in greedy search
Fix incorrect token index when using a target prefix and a vocabulary mapping file
Set the OpenMP flag when compiling on Windows with -DOPENMP_RUNTIME=INTEL or -DOPENMP_RUNTIME=COMP

Assets 2

Loading

PJ-Finlay and BJTULMT reacted with thumbs up emoji

dingedi reacted with hooray emoji

All reactions

👍 2 reactions
🎉 1 reaction

3 people reacted

CTranslate2 2.7.0

04 Nov 15:59

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 2.7.0

Changes

Inputs are now truncated after 1024 tokens by default to limit the maximum memory usage (see translation option max_input_length)

New features

Add translation option max_input_length to limit the model input length
Add translation option repetition_penalty to apply an exponential penalty on repeated sequences
Add scoring option with_tokens_score to also output token-level scores when scoring a file

Fixes and improvements

Adapt the length penalty formula when using normalize_scores to match other implementations: the scores are divided by pow(length, length_penalty)
Implement LayerNorm with a single CUDA kernel instead of 2
Simplify the beam search implementation

Assets 2

Loading

PJ-Finlay and dingedi reacted with thumbs up emoji

All reactions

👍 2 reactions

2 people reacted

CTranslate2 2.6.0

15 Oct 14:27

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 2.6.0

New features

Build wheels for Python 3.10
Accept passing the vocabulary as a opennmt.data.Vocab object or a list of tokens in the OpenNMT-tf converter

Fixes and improvements

Fix segmentation fault in greedy search when normalize_scores is enabled but not return_scores
Fix segmentation fault when min_decoding_length and max_decoding_length are both set to 0
Fix segmentation fault when sampling_topk is larger than the vocabulary size
Fix incorrect score normalization in greedy search when max_decoding_length is reached
Fix incorrect score normalization in the return_alternatives translation mode
Improve error checking when reading the binary model file
Apply LogSoftMax in-place during decoding and scoring

Assets 2

Loading

PJ-Finlay reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

CTranslate2 2.5.1

04 Oct 16:34

guillaumekln

Compare

Choose a tag to compare

Loading

CTranslate2 2.5.1

Fixes and improvements

Fix logic error in the in-place implementation of the Gather op that could lead to incorrect beam search outputs

Assets 2

Loading

PJ-Finlay reacted with thumbs up emoji

All reactions

👍 1 reaction

1 person reacted

Previous 1 2 … 5 6 7 8 9 … 12 13 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.