Skip to content

Files

Latest commit

d88abf6 · Jun 21, 2020

History

History
508 lines (497 loc) · 8.59 KB

sumqe.md

File metadata and controls

508 lines (497 loc) · 8.59 KB

Sum-QE

Sum-QE [1] is a BERT-based model to estimate the linguistic quality of a summary. Our implementation wraps our fork of the original code which creates a more compatible command-line interface. We additionally followed the steps in the repository to retrain their multi-task 5 models which we have saved on AWS:

The name for this metric is sum-qe.

Setting Up

Sum-QE has many Python dependencies. We recommend referencing the repository's instructions for creating the conda environment. The path to the Python binary can be passed to the SumQE class.

The models can be set up with the following command:

sacrerouge setup-metric sum-qe \
    --download-2005-2006-model \
    --download-2005-2007-model \
    --download-2006-2007-model

Each of the --download arguments is optional.

To verify your installation, run:

pytest sacrerouge/tests/metrics/sumqe_test.py

This requires setting the environment variable SUMQE_PYTHON_BINARY to the Python binary with the Sum-QE dependencies installed.

Correlations

Here are the correlations of Sum-QE as implemented in SacreROUGE to the "overall responsiveness" human judgments on several datasets.

Summary-level, peers only:

DUC2005-Q1 DUC2006-Q1 DUC2007-Q1 DUC2005-Q2 DUC2006-Q2 DUC2007-Q2 DUC2005-Q3 DUC2006-Q3 DUC2007-Q3 DUC2005-Q4 DUC2006-Q4 DUC2007-Q4 DUC2005-Q5 DUC2006-Q5 DUC2007-Q5
r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k
SumQE 0.46 0.42 0.33 0.60 0.55 0.44 0.65 0.58 0.47 0.25 0.21 0.17 0.29 0.23 0.19 0.32 0.30 0.24 0.40 0.40 0.31 0.46 0.44 0.35 0.49 0.46 0.37 0.35 0.36 0.28 0.36 0.32 0.25 0.43 0.38 0.31 0.27 0.26 0.21 0.43 0.38 0.30 0.49 0.45 0.36

Summary-level, peers + references:

DUC2005-Q1 DUC2006-Q1 DUC2007-Q1 DUC2005-Q2 DUC2006-Q2 DUC2007-Q2 DUC2005-Q3 DUC2006-Q3 DUC2007-Q3 DUC2005-Q4 DUC2006-Q4 DUC2007-Q4 DUC2005-Q5 DUC2006-Q5 DUC2007-Q5
r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k
SumQE 0.45 0.40 0.32 0.64 0.60 0.49 0.67 0.62 0.50 0.16 0.12 0.10 0.30 0.25 0.20 0.34 0.33 0.26 0.30 0.30 0.23 0.43 0.42 0.33 0.44 0.40 0.31 0.24 0.26 0.20 0.41 0.37 0.29 0.39 0.34 0.27 0.03 0.06 0.05 0.41 0.39 0.31 0.44 0.41 0.32

System-level, peers only:

DUC2005-Q1 DUC2006-Q1 DUC2007-Q1 DUC2005-Q2 DUC2006-Q2 DUC2007-Q2 DUC2005-Q3 DUC2006-Q3 DUC2007-Q3 DUC2005-Q4 DUC2006-Q4 DUC2007-Q4 DUC2005-Q5 DUC2006-Q5 DUC2007-Q5
r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k
SumQE 0.81 0.75 0.58 0.92 0.88 0.73 0.92 0.83 0.65 0.41 0.29 0.22 0.69 0.62 0.46 0.71 0.76 0.56 0.68 0.79 0.58 0.92 0.87 0.72 0.92 0.92 0.77 0.71 0.75 0.56 0.89 0.89 0.73 0.90 0.84 0.66 0.54 0.62 0.43 0.86 0.81 0.64 0.87 0.88 0.74

System-level, peers + references:

DUC2005-Q1 DUC2006-Q1 DUC2007-Q1 DUC2005-Q2 DUC2006-Q2 DUC2007-Q2 DUC2005-Q3 DUC2006-Q3 DUC2007-Q3 DUC2005-Q4 DUC2006-Q4 DUC2007-Q4 DUC2005-Q5 DUC2006-Q5 DUC2007-Q5
r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k r p k
SumQE 0.66 0.63 0.47 0.93 0.91 0.77 0.93 0.88 0.72 -0.30 -0.31 -0.20 0.73 0.72 0.52 0.67 0.76 0.57 0.30 0.49 0.35 0.67 0.74 0.59 0.63 0.55 0.42 0.24 0.44 0.33 0.83 0.88 0.73 0.60 0.58 0.43 -0.20 0.05 0.06 0.64 0.77 0.58 0.57 0.66 0.51

References

[1] Stratos Xenouleas, Prodromos Malakasiotis, Marianna Apidianaki and Ion Androutsopoulos. Sum-QE: a BERT-based Summary Quality Estimation Model. EMNLP 2019.