Our BLEU implementation is a wrapper around SacreBLEU.
Although BLEU was intended to be a corpus-level metric, we have only implemented the sentence-level version.
See sacrebleu.BLEU
for details.
The metric is registered under the name sent-bleu
.
No setup is required.
Here are the correlations of SentBLEU to overall responsiveness correlations on the TAC datasets:
Summary-level, peers only:
TAC2008 | TAC2009 | TAC2010 | TAC2011 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
r | p | k | r | p | k | r | p | k | r | p | k | |
SentBLEU | 0.39 | 0.41 | 0.33 | 0.41 | 0.47 | 0.37 | 0.56 | 0.56 | 0.46 | 0.44 | 0.43 | 0.35 |
Summary-level, peers + references:
TAC2008 | TAC2009 | TAC2010 | TAC2011 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
r | p | k | r | p | k | r | p | k | r | p | k | |
SentBLEU | 0.45 | 0.45 | 0.36 | 0.36 | 0.48 | 0.38 | 0.57 | 0.59 | 0.47 | 0.42 | 0.43 | 0.34 |
System-level, peers only:
TAC2008 | TAC2009 | TAC2010 | TAC2011 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
r | p | k | r | p | k | r | p | k | r | p | k | |
SentBLEU | 0.85 | 0.87 | 0.68 | 0.61 | 0.83 | 0.66 | 0.96 | 0.92 | 0.78 | 0.92 | 0.78 | 0.59 |
System-level, peers + references:
TAC2008 | TAC2009 | TAC2010 | TAC2011 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
r | p | k | r | p | k | r | p | k | r | p | k | |
SentBLEU | 0.87 | 0.90 | 0.74 | 0.41 | 0.88 | 0.71 | 0.92 | 0.94 | 0.81 | 0.71 | 0.77 | 0.60 |
Here are the correlations of SentBLEU to the annotations in Fabbri et al. (2020):
Summary-level, peers only:
Fabbri2020 | |||
---|---|---|---|
r | p | k | |
SentBLEU | 0.16 | 0.15 | 0.11 |
System-level, peers only:
Fabbri2020 | |||
---|---|---|---|
r | p | k | |
SentBLEU | 0.50 | 0.35 | 0.20 |
Here are the correlations to the annotations collected by Bhandari et al. (2020): Summary-level, peers only:
Bhandari2020-Abs | Bhandari2020-Ext | Bhandari2020-Mix | |||||||
---|---|---|---|---|---|---|---|---|---|
r | p | k | r | p | k | r | p | k | |
SentBLEU | 0.39 | 0.38 | 0.30 | 0.16 | 0.16 | 0.14 | 0.28 | 0.26 | 0.20 |
System-level, peers only:
Bhandari2020-Abs | Bhandari2020-Ext | Bhandari2020-Mix | |||||||
---|---|---|---|---|---|---|---|---|---|
r | p | k | r | p | k | r | p | k | |
SentBLEU | 0.41 | 0.52 | 0.36 | -0.13 | -0.07 | -0.05 | 0.26 | 0.32 | 0.21 |