|
1 | 1 | # Release Notes
|
2 | 2 |
|
3 |
| -- 2.4.2 (2024-04-12) |
| 3 | +## 2.5.0 (2025-01-03) |
| 4 | + |
| 5 | +### Added |
| 6 | + |
| 7 | +- WMT24 test sets |
| 8 | + |
| 9 | +### Fixed |
| 10 | + |
| 11 | +- Convert Changelog to markdown format |
| 12 | +- Add optimization for compute_bleu precision initialization (#257) |
| 13 | + Thanks to Ernests Lavrinovics for this contribution. |
| 14 | + |
| 15 | +## 2.4.2 (2024-04-12) |
4 | 16 | Added:
|
5 | 17 | - Add printing of domain if present (via --echo)
|
6 | 18 |
|
7 |
| -- 2.4.1 (2024-03-12) |
| 19 | +## 2.4.1 (2024-03-12) |
8 | 20 | Fixed:
|
9 | 21 | - Add exports to package __init__.py
|
10 | 22 |
|
11 |
| -- 2.4.0 (2023-12-11) |
| 23 | +## 2.4.0 (2023-12-11) |
12 | 24 | Added:
|
13 | 25 | - WMT23 test sets (test set `wmt23`)
|
14 | 26 |
|
15 |
| -- 2.3.3 (2023-11-28) |
| 27 | +## 2.3.3 (2023-11-28) |
16 | 28 | Fixed:
|
17 | 29 | - Typing issues (#249, #250)
|
18 | 30 | - Improved builds (#252)
|
19 | 31 |
|
20 |
| -- 2.3.2 (2023-11-06) |
| 32 | +## 2.3.2 (2023-11-06) |
21 | 33 | Fixed:
|
22 | 34 | - Special treatment of empty references in TER (#232)
|
23 | 35 | - Bump in mecab version for JA (#234)
|
24 | 36 |
|
25 | 37 | Added:
|
26 | 38 | - Warning if `-tok spm` is used (use explicit `flores101` instead) (#238)
|
27 | 39 |
|
28 |
| -- 2.3.1 (2022-10-18) |
| 40 | +## 2.3.1 (2022-10-18) |
29 | 41 | Bugfix:
|
30 | 42 | - Set lru_cache to 2^16 for SPM tokenizer (was set to infinite)
|
31 | 43 |
|
32 |
| -- 2.3.0 (2022-10-18) |
| 44 | +## 2.3.0 (2022-10-18) |
33 | 45 | Features:
|
34 | 46 | - (#203) Added `-tok flores101` and `-tok flores200`, a.k.a. `spbleu`.
|
35 | 47 | These are multilingual tokenizations that make use of the
|
|
44 | 56 | - System outputs: include with wmt22. Also added wmt21/systems which will produce WMT21 submitted systems.
|
45 | 57 | To see available systems, give a dummy system to `--echo`, e.g., `sacrebleu -t wmt22 -l en-de --echo ?`
|
46 | 58 |
|
47 |
| -- 2.2.1 (2022-09-13) |
| 59 | +## 2.2.1 (2022-09-13) |
48 | 60 | Bugfix: Standard usage was returning (and using) each reference twice.
|
49 | 61 |
|
50 |
| -- 2.2.0 (2022-07-25) |
| 62 | +## 2.2.0 (2022-07-25) |
51 | 63 | Features:
|
52 | 64 | - Added WMT21 datasets (thanks to @BrighXiaoHan)
|
53 | 65 | - `--echo` now exposes document metadata where available (e.g., docid, genre, origlang)
|
|
65 | 77 | Many thanks to @BrightXiaoHan (https://github.com/BrightXiaoHan) for the bulk of
|
66 | 78 | the code contributions in this release.
|
67 | 79 |
|
68 |
| -- 2.1.0 (2022-05-19) |
| 80 | +## 2.1.0 (2022-05-19) |
69 | 81 | Features:
|
70 | 82 | - Added `-tok spm` for multilingual SPM tokenization (#168)
|
71 | 83 | (thanks to Naman Goyal and James Cross at Facebook)
|
|
75 | 87 | - Bugfix: BLEU.corpus_score() now using max_ngram_order (#173)
|
76 | 88 | - Upgraded ja-mecab to 1.0.5 (#196)
|
77 | 89 |
|
78 |
| -- 2.0.0 (2021-07-18) |
| 90 | +## 2.0.0 (2021-07-18) |
79 | 91 | - Build: Add Windows and OS X testing to Travis CI.
|
80 | 92 | - Improve documentation and type annotations.
|
81 | 93 | - Drop `Python < 3.6` support and migrate to f-strings.
|
|
137 | 149 | as well as paired bootstrap resampling (`--paired-bs`) and paired approximate
|
138 | 150 | randomization tests (`--paired-ar`) when evaluating multiple systems (#40 and #78).
|
139 | 151 |
|
140 |
| -- 1.5.1 (2021-03-05) |
| 152 | +## 1.5.1 (2021-03-05) |
141 | 153 | - Fix extraction error for WMT18 extra test sets (test-ts) (#142)
|
142 | 154 | - Validation and test datasets are added for multilingual TEDx
|
143 | 155 |
|
144 |
| -- 1.5.0 (2021-01-15) |
| 156 | +## 1.5.0 (2021-01-15) |
145 | 157 | - Fix an assertion error in chrF (#121)
|
146 | 158 | - Add missing `__repr__()` methods for BLEU and TER
|
147 | 159 | - TER: Fix exception when `--short` is used (#131)
|
|
155 | 167 | - Allow variable number of references for BLEU (only via API) (#130).
|
156 | 168 | Thanks to Ondrej Dusek (@tuetschek)
|
157 | 169 |
|
158 |
| -- 1.4.14 (2020-09-13) |
| 170 | +## 1.4.14 (2020-09-13) |
159 | 171 | - Added character-based tokenization (`-tok char`).
|
160 | 172 | Thanks to Christian Federmann.
|
161 | 173 | - Added TER (`-m ter`). Thanks to Ales Tamchyna! (fixes #90)
|
|
166 | 178 | - wmt20/robust/set2 (en-ja, ja-en)
|
167 | 179 | - wmt20/robust/set3 (de-en)
|
168 | 180 |
|
169 |
| -- 1.4.13 (2020-07-30) |
| 181 | +## 1.4.13 (2020-07-30) |
170 | 182 | - Added WMT20 newstest test sets (#103)
|
171 | 183 | - Make mecab3-python an extra dependency, adapt code to new mecab3-python
|
172 | 184 | This fixes the recent Windows installation issues as well (#104)
|
173 | 185 | Japanese support should now be explicitly installed through sacrebleu[ja] package.
|
174 | 186 | - Fix return type annotation of corpus_bleu()
|
175 | 187 | - Improve sentence_score's documentation, do not allow single ref string (#98)
|
176 | 188 |
|
177 |
| -- 1.4.12 (2020-07-03) |
| 189 | +## 1.4.12 (2020-07-03) |
178 | 190 | - Fix a deployment bug (#96)
|
179 | 191 |
|
180 |
| -- 1.4.11 (2020-07-03) |
| 192 | +## 1.4.11 (2020-07-03) |
181 | 193 | - Added Multi30k multimodal MT test set metadata
|
182 | 194 | - Refactored all tokenizers into respective classes (fixes #85)
|
183 | 195 | - Refactored all metrics into respective classes
|
|
193 | 205 | - Added score regression tests for chrF using reference chrF++ implementation
|
194 | 206 | - Added multi-reference & tokenizer & signature tests
|
195 | 207 |
|
196 |
| -- 1.4.10 (2020-05-30) |
| 208 | +## 1.4.10 (2020-05-30) |
197 | 209 | - Fixed bug in signature with mecab tokenizer
|
198 | 210 | - Cleaned up deprecation warnings (thanks to Karthikeyan Singaravelan @tirkarthi)
|
199 | 211 | - Now only lists the external [typing](https://pypi.org/project/typing/)
|
200 | 212 | module as a dependency for Python `<= 3.4`, as it was integrated in the standard
|
201 | 213 | library in Python 3.5 (thanks to Erwan de Lépinau @ErwanDL).
|
202 | 214 | - Added LICENSE to pypi (thanks to Mark Harfouche @hmaarrfk)
|
203 | 215 |
|
204 |
| -- 1.4.9 (2020-04-30) |
| 216 | +## 1.4.9 (2020-04-30) |
205 | 217 | - Changed `get_available_testsets()` to return a list
|
206 | 218 | - Remove Japanese MeCab tokenizer from requirements.
|
207 | 219 | (Must be installed manually to avoid Windows incompatibility).
|
208 | 220 | Many thanks to Makoto Morishita (@MorinoseiMorizo).
|
209 | 221 |
|
210 |
| -- 1.4.8 (2020-04-26) |
| 222 | +## 1.4.8 (2020-04-26) |
211 | 223 | - Added to API:
|
212 | 224 | - get_source_file()
|
213 | 225 | - get_reference_files()
|
|
217 | 229 | - Fixed descriptions of some WMT19/google test sets
|
218 | 230 | - Added API test case (test/test_apy.py)
|
219 | 231 |
|
220 |
| -- 1.4.7 (2020-04-19) |
| 232 | +## 1.4.7 (2020-04-19) |
221 | 233 | - Added Google's extra wmt19/en-de refs (-t wmt19/google/{ar,arp,hqall,hqp,hqr,wmtp})
|
222 | 234 | (Freitag, Grangier, & Caswell
|
223 | 235 | BLEU might be Guilty but References are not Innocent
|
224 | 236 | https://arxiv.org/abs/2004.06063)
|
225 | 237 | - Restored SACREBLEU_DIR and smart_open to exports (thanks to Thomas Liao @tholiao)
|
226 | 238 |
|
227 |
| -- 1.4.6 (2020-03-28) |
| 239 | +## 1.4.6 (2020-03-28) |
228 | 240 | - Large internal reorganization as a module (thanks to Thamme Gowda @thammegowda)
|
229 | 241 |
|
230 |
| -- 1.4.5 (2020-03-28) |
| 242 | +## 1.4.5 (2020-03-28) |
231 | 243 | - Added Japanese MeCab tokenizer (`-tok ja-mecab`) (thanks to Makoto Morishita @MorinoseiMorizo)
|
232 | 244 | - Added wmt20/dev test sets (thanks to Martin Popel @martinpopel)
|
233 | 245 |
|
234 |
| -- 1.4.4 (2020-03-10) |
| 246 | +## 1.4.4 (2020-03-10) |
235 | 247 | - Smoothing changes (Sebastian Nickels @sn1c)
|
236 | 248 | - Fixed bug that only applied smoothing to n-grams for n > 2
|
237 | 249 | - Added default smoothing values for methods "floor" (0) and "add-k" (1)
|
|
240 | 252 | - added missing languages for IWSLT17
|
241 | 253 | - Minor code improvements (Thomas Liao @tholiao)
|
242 | 254 |
|
243 |
| -- 1.4.3 (2019-12-02) |
| 255 | +## 1.4.3 (2019-12-02) |
244 | 256 | - Bugfix: handling of result object for CHRF
|
245 | 257 | - Improved API example
|
246 | 258 |
|
247 |
| -- 1.4.2 (2019-10-11) |
| 259 | +## 1.4.2 (2019-10-11) |
248 | 260 | - Tokenization variant omitted from the chrF signature; it is relevant only for BLEU (thanks to Martin Popel)
|
249 | 261 | - Bugfix: call to sentence_bleu (thanks to Rachel Bawden)
|
250 | 262 | - Documentation example for Python API (thanks to Vlad Lyalin)
|
251 | 263 | - Calls to corpus_chrf and sentence_chrf now return a an object instead of a float (use result.score)
|
252 | 264 |
|
253 |
| -- 1.4.1 (2019-09-11) |
| 265 | +## 1.4.1 (2019-09-11) |
254 | 266 | - Added sentence-level scoring via -sl (--sentence-level)
|
255 | 267 |
|
256 |
| -- 1.4.0 (2019-09-10) |
| 268 | +## 1.4.0 (2019-09-10) |
257 | 269 | - Many thanks to Martin Popel for all the changes below!
|
258 | 270 | - Added evaluation on concatenated test sets (e.g., `-t wmt17,wmt18`).
|
259 | 271 | Works as long as they all have the same language pair.
|
|
269 | 281 | - Documentation and tests updates
|
270 | 282 | - Fixed a race condition bug (`os.makedirs(outdir, exist_ok=True)` instead of `if os.path.exists`)
|
271 | 283 |
|
272 |
| -- 1.3.7 (2019-07-12) |
| 284 | +## 1.3.7 (2019-07-12) |
273 | 285 | - Lazy loading of regexes cuts import time from ~1s to nearly nothing (thanks, @louismartin!)
|
274 | 286 | - Added a simple (non-atomic) lock on downloading
|
275 | 287 | - Can now read multiple refs from a single tab-delimited file.
|
276 | 288 | You need to pass `--num-refs N` to tell it to run the split.
|
277 | 289 | Only works with a single reference file passed from the command line.
|
278 | 290 |
|
279 |
| -- 1.3.6 (2019-06-10) |
| 291 | +## 1.3.6 (2019-06-10) |
280 | 292 | - Removed another f-string for Python 3.5 compatibility
|
281 | 293 |
|
282 |
| -- 1.3.5 (2019-06-07) |
| 294 | +## 1.3.5 (2019-06-07) |
283 | 295 | - Restored Python 3.5 compatibility
|
284 | 296 |
|
285 |
| -- 1.3.4 (2019-05-28) |
| 297 | +## 1.3.4 (2019-05-28) |
286 | 298 | - Added MTNT 2019 test sets
|
287 | 299 | - Added a BLEU object
|
288 | 300 |
|
289 |
| -- 1.3.3 (2019-05-08) |
| 301 | +## 1.3.3 (2019-05-08) |
290 | 302 | - Added WMT'19 test sets
|
291 | 303 |
|
292 |
| -- 1.3.2 (2018-04-24) |
| 304 | +## 1.3.2 (2018-04-24) |
293 | 305 | - Bugfix in test case (thanks to Adam Roberts, @adarob)
|
294 | 306 | - Passing smoothing method through `sentence_bleu`
|
295 | 307 |
|
296 |
| -- 1.3.1 (2019-03-20) |
| 308 | +## 1.3.1 (2019-03-20) |
297 | 309 | - Added another smoothing approach (add-k) and a command-line option for choosing the smoothing method
|
298 | 310 | (`--smooth exp|floor|add-n|none`) and the associated value (`--smooth-value`), when relevant.
|
299 | 311 | - Changed interface to some functions (backwards incompatible)
|
300 | 312 | - 'smooth' is now 'smooth_method'
|
301 | 313 | - 'smooth_floor' is now 'smooth_value'
|
302 | 314 |
|
303 |
| -- 1.2.21 (19 March 2019) |
| 315 | +## 1.2.21 (19 March 2019) |
304 | 316 | - Ctrl-M characters are now treated as normal characters, previously treated as newline.
|
305 | 317 |
|
306 |
| -- 1.2.20 (28 February 2018) |
| 318 | +## 1.2.20 (28 February 2018) |
307 | 319 | - Tokenization now defaults to "zh" when language pair is known
|
308 | 320 |
|
309 |
| -- 1.2.19 (19 February 2019) |
| 321 | +## 1.2.19 (19 February 2019) |
310 | 322 | - Updated checksum for wmt19/dev (seems to have changed)
|
311 | 323 |
|
312 |
| -- 1.2.18 (19 February 2019) |
| 324 | +## 1.2.18 (19 February 2019) |
313 | 325 | - Fixed checksum for wmt17/dev (copy-paste error)
|
314 | 326 |
|
315 |
| -- 1.2.17 (6 February 2019) |
| 327 | +## 1.2.17 (6 February 2019) |
316 | 328 | - Added kk-en and en-kk to wmt19/dev
|
317 | 329 |
|
318 |
| -- 1.2.16 (4 February 2019) |
| 330 | +## 1.2.16 (4 February 2019) |
319 | 331 | - Added gu-en and en-gu to wmt19/dev
|
320 | 332 |
|
321 |
| -- 1.2.15 (30 January 2019) |
| 333 | +## 1.2.15 (30 January 2019) |
322 | 334 | - Added MD5 checksumming of downloaded files for all datasets.
|
323 | 335 |
|
324 |
| -- 1.2.14 (22 January 2019) |
| 336 | +## 1.2.14 (22 January 2019) |
325 | 337 | - Added mtnt1.1/train mtnt1.1/valid mtnt1.1/test data from [MTNT](http://www.cs.cmu.edu/~pmichel1/mtnt/)
|
326 | 338 |
|
327 |
| -- 1.2.13 (22 January 2019) |
| 339 | +## 1.2.13 (22 January 2019) |
328 | 340 | - Added 'wmt19/dev' task for 'lt-en' and 'en-lt' (development data for new tasks).
|
329 | 341 | - Added MD5 checksum for downloaded tarballs.
|
330 | 342 |
|
331 |
| -- 1.2.12 (8 November 2018) |
| 343 | +## 1.2.12 (8 November 2018) |
332 | 344 | - Now outputs only only digit after the decimal
|
333 | 345 |
|
334 |
| -- 1.2.11 (29 August 2018) |
| 346 | +## 1.2.11 (29 August 2018) |
335 | 347 | - Added a function for sentence-level, smoothed BLEU
|
336 | 348 |
|
337 |
| -- 1.2.10 (23 May 2018) |
| 349 | +## 1.2.10 (23 May 2018) |
338 | 350 | - Added wmt18 test set (with references)
|
339 | 351 |
|
340 |
| -- 1.2.9 (15 May 2018) |
| 352 | +## 1.2.9 (15 May 2018) |
341 | 353 | - Added zh-en, en-zh, tr-en, and en-tr datasets for wmt18/test-ts
|
342 | 354 |
|
343 |
| -- 1.2.8 (14 May 2018) |
| 355 | +## 1.2.8 (14 May 2018) |
344 | 356 | - Added wmt18/test-ts, the test sources (only) for [WMT18](http://statmt.org/wmt18/translation-task.html)
|
345 | 357 | - Moved README out of `sacrebleu.py` and the CHANGELOG into a separate file
|
346 | 358 |
|
347 |
| -- 1.2.7 (10 April 2018) |
| 359 | +## 1.2.7 (10 April 2018) |
348 | 360 | - fixed another locale issue (with --echo)
|
349 | 361 | - grudgingly enabled `-tok none` from the command line
|
350 | 362 |
|
351 |
| -- 1.2.6 (22 March 2018) |
| 363 | +## 1.2.6 (22 March 2018) |
352 | 364 | - added wmt17/ms (Microsoft's [additional ZH-EN references](https://github.com/MicrosoftTranslator/Translator-HumanParityData)).
|
353 | 365 | Try `sacrebleu -t wmt17/ms --cite`.
|
354 | 366 | - `--echo ref` now pastes together all references, if there is more than one
|
355 | 367 |
|
356 |
| -- 1.2.5 (13 March 2018) |
| 368 | +## 1.2.5 (13 March 2018) |
357 | 369 | - added wmt18/dev datasets (en-et and et-en)
|
358 | 370 | - fixed logic with --force
|
359 | 371 | - locale-independent installation
|
360 | 372 | - added "--echo both" (tab-delimited)
|
361 | 373 |
|
362 |
| -- 1.2.3 (28 January 2018) |
| 374 | +## 1.2.3 (28 January 2018) |
363 | 375 | - metrics (`-m`) are now printed in the order requested
|
364 | 376 | - chrF now prints a version string (including the beta parameter, importantly)
|
365 | 377 | - attempt to remove dependence on locale setting
|
366 | 378 |
|
367 |
| -- 1.2 (17 January 2018) |
| 379 | +## 1.2 (17 January 2018) |
368 | 380 | - added the chrF metric (`-m chrf` or `-m bleu chrf` for both)
|
369 | 381 | See 'CHRF: character n-gram F-score for automatic MT evaluation' by Maja Popovic (WMT 2015)
|
370 | 382 | [http://www.statmt.org/wmt15/pdf/WMT49.pdf]
|
|
374 | 386 | - added `--input` (`-i`) to set input to a file instead of STDIN
|
375 | 387 | - removed accent mark after objection from UN official
|
376 | 388 |
|
377 |
| -- 1.1.7 (27 November 2017) |
| 389 | +## 1.1.7 (27 November 2017) |
378 | 390 | - corpus_bleu() now raises an exception if input streams are different lengths
|
379 | 391 | - thanks to Martin Popel for:
|
380 | 392 | - small bugfix in tokenization_13a (not affecting WMT references)
|
381 | 393 | - adding `--tok intl` (international tokenization)
|
382 | 394 | - added wmt17/dev and wmt17/dev sets (for languages intro'd those years)
|
383 | 395 |
|
384 |
| -- 1.1.6 (15 November 2017) |
| 396 | +## 1.1.6 (15 November 2017) |
385 | 397 | - bugfix for tokenization warning
|
386 | 398 |
|
387 |
| -- 1.1.5 (12 November 2017) |
| 399 | +## 1.1.5 (12 November 2017) |
388 | 400 | - added -b option (only output the BLEU score)
|
389 | 401 | - removed fi-en from list of WMT16/17 systems with more than one reference
|
390 | 402 | - added WMT16/tworefs and WMT17/tworefs for scoring with both en-fi references
|
391 | 403 |
|
392 |
| -- 1.1.4 (10 November 2017) |
| 404 | +## 1.1.4 (10 November 2017) |
393 | 405 | - added effective order for sentence-level BLEU computation
|
394 | 406 | - added unit tests from sockeye
|
395 | 407 |
|
396 |
| -- 1.1.3 (8 November 2017). |
| 408 | +## 1.1.3 (8 November 2017). |
397 | 409 | - Factored code a bit to facilitate API:
|
398 | 410 | - compute_bleu: works from raw stats
|
399 | 411 | - corpus_bleu for use from the command line
|
|
402 | 414 | - Added 'floor' smoothing (adds 0.01 to 0 counts, more versatile via API), 'none' smoothing (via API)
|
403 | 415 | - Small bugfixes, windows compatibility (H/T Christian Federmann)
|
404 | 416 |
|
405 |
| -- 1.0.3 (4 November 2017). |
| 417 | +## 1.0.3 (4 November 2017). |
406 | 418 | - Contributions from Christian Federmann:
|
407 | 419 | - Added explicit support for encoding
|
408 | 420 | - Fixed Windows support
|
409 | 421 | - Bugfix in handling reference length with multiple refs
|
410 | 422 |
|
411 |
| -- version 1.0.1 (1 November 2017). |
| 423 | +## version 1.0.1 (1 November 2017). |
412 | 424 | - Small bugfix affecting some versions of Python.
|
413 | 425 | - Code reformatting due to Ozan Çağlayan.
|
414 | 426 |
|
415 |
| -- version 1.0 (23 October 2017). |
| 427 | +## version 1.0 (23 October 2017). |
416 | 428 | - Support for WMT 2008--2017.
|
417 | 429 | - Single tokenization (v13a) with lowercase fix (proper lower() instead of just A-Z).
|
418 | 430 | - Chinese tokenization.
|
|
0 commit comments