Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference order changes output scoring #7

Open
danieldeutsch opened this issue Aug 8, 2020 · 2 comments
Open

Reference order changes output scoring #7

danieldeutsch opened this issue Aug 8, 2020 · 2 comments

Comments

@danieldeutsch
Copy link

Hi,

I think there is a bug in which the order of the references changes the final score of the peer summary. Here is a concrete example of input documents:

Raw/peers/0: Dan bought scones today.
Raw/model/A: Dan went to buy scones at the store.
Raw/model/B: Dan bought scones today. He also went to the bakery.

The output from this is [{'0': 2}, {'0': 1}, {'0': 0}, {'0': 0.5}]. However, if I change Raw/model/A to be called Raw/model/C, the output is [{'0': 2}, {'0': 1.0}, {'0': 1.0}, {'0': 1.0}]. My expectation is that the order of the reference summaries shouldn't matter.

Thanks!

@serenayj
Copy link
Owner

serenayj commented Aug 8, 2020

Hi Daniel,

Thanks for raising this issue. The order of reference summaries in the Raw/model folder does not affect the overall quality of resulted pyramid (in terms of ranking the target summaries or systems), but it will affect what the pyramid looks like and its summary content units. PyrEval reads in reference summaries in a default ordering, builds and traverses a content graph, therefore there are some differences in allocating propositions into summary content units if the order changes, depending on which propositions it sees first. It is an emergent process by nature: when building the pyramid, human annotators read and find the propositions in a given reference and decide which summary content unit it belongs, and move on to the next reference. Annotators might read and pick out propositions in different order, thus the resulted pyramid and scores will look different, but the ranking will be the same, proved in many previous empirical studies. For PyrEval, this is shown by the high correlation of rankings with manual pyramids.

From what you mention, I think we could improve PyrEval by adding option to shuffle the order of reference summaries and build different pyramids, and provide an average score from these pyramids.

@rpassonneau
Copy link
Collaborator

rpassonneau commented Aug 9, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants