Skip to content

Conversation

@lihebi
Copy link
Collaborator

@lihebi lihebi commented Jan 14, 2023

For frank, we have multiple system summaries for a doc, so we report regular correlation results in terms of:

  • system-level
  • summary-level

For qags-cnndm and factCC datasets, we don't have docID in the input data. Instead, we only have (doc, sum, human_score). Thus, we will report:

  • system-level
  • pool-level: where all rows are assigned the same ID; thus all rows are in one large batch to compute correlation. The file name is still summary-level. We won't have summary-level results as there's only one system.

The data is obtained with g2/env.py, i.e., it contains both bertscore and mnli models as the metrics.

The sample results are using the first 10 rows of the dataframe (df[:10]), just for demo purposes. The full experiments are running.

@lihebi lihebi mentioned this pull request Jan 14, 2023
5 tasks
@lihebi lihebi changed the title FactCC & Frank & QAGS results on pairwise MNLI model metrics (sample results using df[:10]) (demo) FactCC & Frank & QAGS results on pairwise MNLI model metrics (sample results using df[:10]) Jan 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants