Sum-QE [1] is a BERT-based model to estimate the linguistic quality of a summary. Our implementation wraps our fork of the original code which creates a more compatible command-line interface. We additionally followed the steps in the repository to retrain their multi-task 5 models which we have saved on AWS:
- Model trained on DUC 2005 and 2006
- Model trained on DUC 2005 and 2007
- Model trained on DUC 2006 and 2007
The name for this metric is sum-qe
.
Sum-QE has many Python dependencies.
We recommend referencing the repository's instructions for creating the conda environment.
The path to the Python binary can be passed to the SumQE
class.
The models can be set up with the following command:
sacrerouge setup-metric sum-qe \
--download-2005-2006-model \
--download-2005-2007-model \
--download-2006-2007-model
Each of the --download
arguments is optional.
To verify your installation, run:
pytest sacrerouge/tests/metrics/sumqe_test.py
This requires setting the environment variable SUMQE_PYTHON_BINARY
to the Python binary with the Sum-QE dependencies installed.
Here are the correlations of Sum-QE as implemented in SacreROUGE to the "overall responsiveness" human judgments on several datasets.
Summary-level, peers only:
DUC2005-Q1 | DUC2006-Q1 | DUC2007-Q1 | DUC2005-Q2 | DUC2006-Q2 | DUC2007-Q2 | DUC2005-Q3 | DUC2006-Q3 | DUC2007-Q3 | DUC2005-Q4 | DUC2006-Q4 | DUC2007-Q4 | DUC2005-Q5 | DUC2006-Q5 | DUC2007-Q5 | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | |
SumQE | 0.46 | 0.42 | 0.33 | 0.60 | 0.55 | 0.44 | 0.65 | 0.58 | 0.47 | 0.25 | 0.21 | 0.17 | 0.29 | 0.23 | 0.19 | 0.32 | 0.30 | 0.24 | 0.40 | 0.40 | 0.31 | 0.46 | 0.44 | 0.35 | 0.49 | 0.46 | 0.37 | 0.35 | 0.36 | 0.28 | 0.36 | 0.32 | 0.25 | 0.43 | 0.38 | 0.31 | 0.27 | 0.26 | 0.21 | 0.43 | 0.38 | 0.30 | 0.49 | 0.45 | 0.36 |
Summary-level, peers + references:
DUC2005-Q1 | DUC2006-Q1 | DUC2007-Q1 | DUC2005-Q2 | DUC2006-Q2 | DUC2007-Q2 | DUC2005-Q3 | DUC2006-Q3 | DUC2007-Q3 | DUC2005-Q4 | DUC2006-Q4 | DUC2007-Q4 | DUC2005-Q5 | DUC2006-Q5 | DUC2007-Q5 | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | |
SumQE | 0.45 | 0.40 | 0.32 | 0.64 | 0.60 | 0.49 | 0.67 | 0.62 | 0.50 | 0.16 | 0.12 | 0.10 | 0.30 | 0.25 | 0.20 | 0.34 | 0.33 | 0.26 | 0.30 | 0.30 | 0.23 | 0.43 | 0.42 | 0.33 | 0.44 | 0.40 | 0.31 | 0.24 | 0.26 | 0.20 | 0.41 | 0.37 | 0.29 | 0.39 | 0.34 | 0.27 | 0.03 | 0.06 | 0.05 | 0.41 | 0.39 | 0.31 | 0.44 | 0.41 | 0.32 |
System-level, peers only:
DUC2005-Q1 | DUC2006-Q1 | DUC2007-Q1 | DUC2005-Q2 | DUC2006-Q2 | DUC2007-Q2 | DUC2005-Q3 | DUC2006-Q3 | DUC2007-Q3 | DUC2005-Q4 | DUC2006-Q4 | DUC2007-Q4 | DUC2005-Q5 | DUC2006-Q5 | DUC2007-Q5 | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | |
SumQE | 0.81 | 0.75 | 0.58 | 0.92 | 0.88 | 0.73 | 0.92 | 0.83 | 0.65 | 0.41 | 0.29 | 0.22 | 0.69 | 0.62 | 0.46 | 0.71 | 0.76 | 0.56 | 0.68 | 0.79 | 0.58 | 0.92 | 0.87 | 0.72 | 0.92 | 0.92 | 0.77 | 0.71 | 0.75 | 0.56 | 0.89 | 0.89 | 0.73 | 0.90 | 0.84 | 0.66 | 0.54 | 0.62 | 0.43 | 0.86 | 0.81 | 0.64 | 0.87 | 0.88 | 0.74 |
System-level, peers + references:
DUC2005-Q1 | DUC2006-Q1 | DUC2007-Q1 | DUC2005-Q2 | DUC2006-Q2 | DUC2007-Q2 | DUC2005-Q3 | DUC2006-Q3 | DUC2007-Q3 | DUC2005-Q4 | DUC2006-Q4 | DUC2007-Q4 | DUC2005-Q5 | DUC2006-Q5 | DUC2007-Q5 | |||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | r | p | k | |
SumQE | 0.66 | 0.63 | 0.47 | 0.93 | 0.91 | 0.77 | 0.93 | 0.88 | 0.72 | -0.30 | -0.31 | -0.20 | 0.73 | 0.72 | 0.52 | 0.67 | 0.76 | 0.57 | 0.30 | 0.49 | 0.35 | 0.67 | 0.74 | 0.59 | 0.63 | 0.55 | 0.42 | 0.24 | 0.44 | 0.33 | 0.83 | 0.88 | 0.73 | 0.60 | 0.58 | 0.43 | -0.20 | 0.05 | 0.06 | 0.64 | 0.77 | 0.58 | 0.57 | 0.66 | 0.51 |
[1] Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki and Ion Androutsopoulos. Sum-QE: a BERT-based Summary Quality Estimation Model. EMNLP 2019.