BLEU

Our BLEU implementation is a wrapper around SacreBLEU. Although BLEU was intended to be a corpus-level metric, we have only implemented the sentence-level version. See sacrebleu.BLEU for details. The metric is registered under the name sent-bleu.

Setting Up

No setup is required.

Correlations

Here are the correlations of SentBLEU to overall responsiveness correlations on the TAC datasets:

Summary-level, peers only:

	TAC2008			TAC2009			TAC2010			TAC2011
	r	p	k	r	p	k	r	p	k	r	p	k
SentBLEU	0.39	0.41	0.33	0.41	0.47	0.37	0.56	0.56	0.46	0.44	0.43	0.35

Summary-level, peers + references:

	TAC2008			TAC2009			TAC2010			TAC2011
	r	p	k	r	p	k	r	p	k	r	p	k
SentBLEU	0.45	0.45	0.36	0.36	0.48	0.38	0.57	0.59	0.47	0.42	0.43	0.34

System-level, peers only:

	TAC2008			TAC2009			TAC2010			TAC2011
	r	p	k	r	p	k	r	p	k	r	p	k
SentBLEU	0.85	0.87	0.68	0.61	0.83	0.66	0.96	0.92	0.78	0.92	0.78	0.59

System-level, peers + references:

	TAC2008			TAC2009			TAC2010			TAC2011
	r	p	k	r	p	k	r	p	k	r	p	k
SentBLEU	0.87	0.90	0.74	0.41	0.88	0.71	0.92	0.94	0.81	0.71	0.77	0.60

Here are the correlations of SentBLEU to the annotations in Fabbri et al. (2020):

Summary-level, peers only:

	Fabbri2020
	r	p	k
SentBLEU	0.16	0.15	0.11

System-level, peers only:

	Fabbri2020
	r	p	k
SentBLEU	0.50	0.35	0.20

Here are the correlations to the annotations collected by Bhandari et al. (2020): Summary-level, peers only:

	Bhandari2020-Abs			Bhandari2020-Ext			Bhandari2020-Mix
	r	p	k	r	p	k	r	p	k
SentBLEU	0.39	0.38	0.30	0.16	0.16	0.14	0.28	0.26	0.20

System-level, peers only:

	Bhandari2020-Abs			Bhandari2020-Ext			Bhandari2020-Mix
	r	p	k	r	p	k	r	p	k
SentBLEU	0.41	0.52	0.36	-0.13	-0.07	-0.05	0.26	0.32	0.21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

bleu.md

bleu.md

BLEU

Setting Up

Correlations

Files

bleu.md

Latest commit

History

bleu.md

File metadata and controls

BLEU

Setting Up

Correlations