We create a large-scale dialogue corpus that provides pragmatic paraphrases to advance technology for understanding users' nonverbal intentions.
Our corpus provides a total of 71,498 indirect-direct utterance pairs accompanied by a multi-turn dialogue history extracted from the MultiWoZ dataset.
train.csv consists of five columns except for the index column.
| column name | description |
|---|---|
| dialogue_id | The corresponding dialogue ID in the MultiWoZ data |
| turn_index | The position of the utterance in the dialogue (0-based index) |
| target_utterance | The original utterance extracted from the MultiWoZ data |
| direct_utterance | Collected direct response |
| indirect_utterance | Collected indirect responses |
In addition to the above columns, test.csv contains the quality evaluation results.
| column name | description |
|---|---|
| isacceptable_direct | Whether direct_utterance is considered to be the same intention as target_utterance (TRUE / FALSE) |
| isacceptable_indirect | Whether indirect_utterance is considered to be the same intention as target_utterance (TRUE / FALSE) |
| quality | Whether direct_utterance is more direct than indirect_utterance (Good / Neutral / Bad) |