From 9da12fa4b1c8054b8ee64946ab83ae429677ac8f Mon Sep 17 00:00:00 2001 From: Guillaume Klein Date: Wed, 6 Nov 2019 11:20:34 +0100 Subject: [PATCH] Bump version and update benchmark numbers (#40) --- CHANGELOG.md | 9 +++++++++ README.md | 16 ++++++++-------- 2 files changed, 17 insertions(+), 8 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index c9856caf2..ea43db67a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,6 +4,15 @@ ### Fixes and improvements +## [v1.2.1](https://github.com/OpenNMT/CTranslate2/releases/tag/v1.2.1) (2019-11-06) + +### Fixes and improvements + +* Avoid copying decoder states when possible to improve decoding performance (10% to 20% faster) +* Fix execution profiling on GPU (device was not synchronized before measuring the time) +* Include `Mul` operation in profiling report +* Add a Python 3 wheel in Ubuntu Docker images + ## [v1.2.0](https://github.com/OpenNMT/CTranslate2/releases/tag/v1.2.0) (2019-10-28) ### New features diff --git a/README.md b/README.md index f7a24ad60..7ded359f6 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ CTranslate2 is a custom C++ inference engine for [OpenNMT-py](https://github.com ## Key features -* **Fast execution**
The execution aims to be faster than a general purpose deep learning framework: on standard translation tasks, it is [up to 3x faster](#benchmarks) than OpenNMT-py. +* **Fast execution**
The execution aims to be faster than a general purpose deep learning framework: on standard translation tasks, it is [up to 4x faster](#benchmarks) than OpenNMT-py. * **Model quantization**
Support INT16 quantization on CPU and INT8 quantization (experimental) on CPU and GPU. * **Parallel translation**
Translations can be run efficiently in parallel without duplicating the model data in memory. * **Dynamic memory usage**
The memory usage changes dynamically depending on the request size while still meeting performance requirements thanks to caching allocators on both CPU and GPU. @@ -302,8 +302,8 @@ Configuration: | | Tokens/s | BLEU | | --- | --- | --- | -| CTranslate2 1.0.1 | 3314.65 | 26.69 | -| CTranslate2 1.0.1 (int8) | 2254.05 | 26.79 | +| CTranslate2 1.2.1 | 3917.32 | 26.70 | +| CTranslate2 1.2.1 (int8) | 2519.24 | 26.80 | | OpenNMT-tf 1.25.0 | 1338.26 | 26.90 | | OpenNMT-py 0.9.2 | 980.44 | 26.69 | @@ -316,11 +316,11 @@ Configuration: | | Tokens/s | BLEU | | --- | --- | --- | -| CTranslate2 1.0.1 (int8 + vmap) | 471.43 | 26.59 | -| CTranslate2 1.0.1 (int16 + vmap) | 423.58 | 26.63 | -| CTranslate2 1.0.1 (int8) | 383.91 | 26.84 | -| CTranslate2 1.0.1 (int16) | 346.25 | 26.68 | -| CTranslate2 1.0.1 (float) | 335.34 | 26.69 | +| CTranslate2 1.2.1 (int8 + vmap) | 591.78 | 26.59 | +| CTranslate2 1.2.1 (int16 + vmap) | 514.36 | 26.63 | +| CTranslate2 1.2.1 (int8) | 448.21 | 26.84 | +| CTranslate2 1.2.1 (int16) | 395.46 | 26.68 | +| CTranslate2 1.2.1 (float) | 381.31 | 26.69 | | OpenNMT-py 0.9.2 | 241.92 | 26.69 | | OpenNMT-tf 1.25.0 | 119.34 | 26.90 |