Skip to content

Files

Latest commit

3ef4683 · Feb 23, 2022

History

History
286 lines (275 loc) · 4.02 KB

bleu.md

File metadata and controls

286 lines (275 loc) · 4.02 KB

BLEU

Our BLEU implementation is a wrapper around SacreBLEU. Although BLEU was intended to be a corpus-level metric, we have only implemented the sentence-level version. See sacrebleu.BLEU for details. The metric is registered under the name sent-bleu.

Setting Up

No setup is required.

Correlations

Here are the correlations of SentBLEU to overall responsiveness correlations on the TAC datasets:

Summary-level, peers only:

TAC2008 TAC2009 TAC2010 TAC2011
r p k r p k r p k r p k
SentBLEU 0.39 0.41 0.33 0.41 0.47 0.37 0.56 0.56 0.46 0.44 0.43 0.35

Summary-level, peers + references:

TAC2008 TAC2009 TAC2010 TAC2011
r p k r p k r p k r p k
SentBLEU 0.45 0.45 0.36 0.36 0.48 0.38 0.57 0.59 0.47 0.42 0.43 0.34

System-level, peers only:

TAC2008 TAC2009 TAC2010 TAC2011
r p k r p k r p k r p k
SentBLEU 0.85 0.87 0.68 0.61 0.83 0.66 0.96 0.92 0.78 0.92 0.78 0.59

System-level, peers + references:

TAC2008 TAC2009 TAC2010 TAC2011
r p k r p k r p k r p k
SentBLEU 0.87 0.90 0.74 0.41 0.88 0.71 0.92 0.94 0.81 0.71 0.77 0.60

Here are the correlations of SentBLEU to the annotations in Fabbri et al. (2020):

Summary-level, peers only:

Fabbri2020
r p k
SentBLEU 0.16 0.15 0.11

System-level, peers only:

Fabbri2020
r p k
SentBLEU 0.50 0.35 0.20

Here are the correlations to the annotations collected by Bhandari et al. (2020): Summary-level, peers only:

Bhandari2020-Abs Bhandari2020-Ext Bhandari2020-Mix
r p k r p k r p k
SentBLEU 0.39 0.38 0.30 0.16 0.16 0.14 0.28 0.26 0.20

System-level, peers only:

Bhandari2020-Abs Bhandari2020-Ext Bhandari2020-Mix
r p k r p k r p k
SentBLEU 0.41 0.52 0.36 -0.13 -0.07 -0.05 0.26 0.32 0.21