Skip to content

Conversation

@lihebi
Copy link
Collaborator

@lihebi lihebi commented Jan 17, 2023

This is the up-to-date code & results for PairFact, including factcc,qags,frank datasets. It is all-in-one repo ready for ACL code review. Changes in this PR:

  1. add factcc,qags,frank datasets (from PR (WIP) add factcc, frank, qags datasets #6)
  2. use pooled-level instead of summary-level for factcc & qags, because there's only one system for each docID, thus cannot compute correlation.
  3. run on both cnndm & xsum
  4. move bertscore & mnli metrics into DocAsRef_0
  5. add generated results to this repo
  6. skip experiments if the corresponding result files exist.
  7. add some description to README.

TODO: anonymize it.

The important file is factcc.py and eval.py (the entry point). Other files are mostly copied from DocAsRef_0 repo.

Current results:

frank-cnndm frank-xsum qags-cnndm qags-xsum factcc
system-level done done done done done
summary-level done done - - -
pooled-level ? ? done done done

Note: the pooled-level collect all results for all docIds, and compute correlations for the entire vectors. The reason is qags and factcc has only one system per docID, thus cannot do summary-level (stats.scipy.correlation throws error on vectors of size 1).

Additional missing results (currently running):

  • qags-xsum-pooled Summary-Level
  • frank-xsum Summary-Level

@lihebi
Copy link
Collaborator Author

lihebi commented Jan 21, 2023

Update: add classification results in the results-classification folder. Include the following thresholds: 0.3, 0.4, 0.5, 0.6, 0.7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants