-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reference order changes output scoring #7
Comments
Hi Daniel, Thanks for raising this issue. The order of reference summaries in the From what you mention, I think we could improve PyrEval by adding option to shuffle the order of reference summaries and build different pyramids, and provide an average score from these pyramids. |
Technically, order of input should not matter. Human annotators are not
instructed to go by the propositions they see first, they are instructed
to go by the groupings that make the most sense, and to iterate and
change their minds as they construct the pyramid. Humans are potentially
biased by the order in which they read the summaries, but the annotation
goal is for annotators to overcome this bias.
The possible order bias is a different issue than individual
differences, and the fact that different human-constructed pyramids tend
to produce the same groupings of words, as shown by interannotator
agreement, and the same correlations of scores.
Yanjun, is there a way to make order not matter in PyrEval?
Becky
…On 8/8/20 5:00 PM, Serena Gao wrote:
Hi Daniel,
Thanks for raising this issue. The order of reference summaries in the
|Raw/model| folder does not affect the overall quality of resulted
pyramid (in terms of ranking the target summaries or systems), but it
will affect what the pyramid looks like and its summary content units.
PyrEval reads in reference summaries in a default ordering, builds and
traverses a content graph, therefore there are some differences in
allocating propositions into summary content units if the order
changes, depending on which propositions it sees first. It is an
emergent process by nature: when building the pyramid, human
annotators read and find the propositions in a given reference and
decide which summary content unit it belongs, and move on to the next
reference. Annotators might read and pick out propositions in
different order, thus the resulted pyramid and scores will look
different, but the ranking will be the same, proved in many previous
empirical studies. For PyrEval, this is shown by the high correlation
of rankings with manual pyramids.
From what you mention, I think we could improve PyrEval by adding
option to shuffle the order of reference summaries and build different
pyramids, and provide an average score from these pyramids.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fserenayj%2FPyrEval%2Fissues%2F7%23issuecomment-670973343&data=02%7C01%7Crjp49%40psu.edu%7C0ad76a5abbd24a48e4e008d83bde1b9f%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C1%7C637325172431362146&sdata=0%2BtHz2rQZxdt06i3eKRj%2F5ecikP7Y%2BaHejgnSLFqW9o%3D&reserved=0>,
or unsubscribe
<https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FAA334XWQ74MA3B7Y4KWRQZ3R7W4HLANCNFSM4PYWT63Q&data=02%7C01%7Crjp49%40psu.edu%7C0ad76a5abbd24a48e4e008d83bde1b9f%7C7cf48d453ddb4389a9c1c115526eb52e%7C0%7C1%7C637325172431372146&sdata=KdoaqG2BaZuIvV2%2BNrIY5s6P6RHnpIX%2FgM8GRf9xn24%3D&reserved=0>.
|
Hi,
I think there is a bug in which the order of the references changes the final score of the peer summary. Here is a concrete example of input documents:
Raw/peers/0
:Dan bought scones today.
Raw/model/A
:Dan went to buy scones at the store.
Raw/model/B
:Dan bought scones today. He also went to the bakery.
The output from this is
[{'0': 2}, {'0': 1}, {'0': 0}, {'0': 0.5}]
. However, if I changeRaw/model/A
to be calledRaw/model/C
, the output is[{'0': 2}, {'0': 1.0}, {'0': 1.0}, {'0': 1.0}]
. My expectation is that the order of the reference summaries shouldn't matter.Thanks!
The text was updated successfully, but these errors were encountered: