BERT score: maximum at self-comparison, symmetry, invariance to additional items #2728
Labels
bug / fix
Something isn't working
good first issue
Good for newcomers
help wanted
Extra attention is needed
v1.4.x
🐛 Bug
I would be expecting the following properties of BERTscore:
idf=False
, extending the list of pred and the list of target should not affect the previous input.There are counterexamples for all of the properties above.
To Reproduce
Steps to reproduce the behavior, run the test suite with the following tests added to test_bertscore.py.
Proposed test suite
Test results
Expected behavior
All tests above should pass.
Environment
Additional context
Maybe this is somehow related to tokenisation or the encoding, but I have not confirmed that. Against this hypothesis is the fact that this still happens for
batch_size=1
.Seems related to PR #2347 . Perhaps the sorting is still incorrectly done?
I have also checked that some of those fail on the original implementation mentioned of BERT score. I have considered whether those properties maybe are simply not expected to hold, but I have found nothing in either the paper nor in the documentation suggesting that, when idf=False and there is no baseline correction.
I am happy to submit a PR with the above tests, which currently all fail.
The text was updated successfully, but these errors were encountered: