You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These results seem to be much lower than the ones reported in the dailydialog paper: https://www.aclweb.org/anthology/I17-1099.pdf
Do you have any clues on why is that the case?
Thanks!
The text was updated successfully, but these errors were encountered:
Hi, thanks for your attention on this repo. Compared with the results in the original DailyDialog paper, the BLEU-1/2 score are lower but it can also be found that the BLEU-3/4 are much better. In my opinion, the BLEU-3/4 score are more suitable than BLEU-1/2, which indicates that the model can generate more fluently. So I think it is just okay. If you are still confused about it, feel free to contact me.
Hi,
I tried running the Seq2Seq and HRED models on dialydialog dataset. Here are the results I got:
Model Seq2Seq Result
BLEU-1: 0.215
BLEU-2: 0.0986
BLEU-3: 0.057
BLEU-4: 0.0366
ROUGE: 0.0492
Distinct-1: 0.0268; Distinct-2: 0.131
Ref distinct-1: 0.0599; Ref distinct-2: 0.3644
BERTScore: 0.1414
Model HRED Result
BLEU-1: 0.2121
BLEU-2: 0.0961
BLEU-3: 0.0542
BLEU-4: 0.0331
ROUGE: 0.0502
Distinct-1: 0.0208; Distinct-2: 0.0992
Ref distinct-1: 0.0588; Ref distinct-2: 0.3619
BERTScore: 0.1436
These results seem to be much lower than the ones reported in the dailydialog paper: https://www.aclweb.org/anthology/I17-1099.pdf
Do you have any clues on why is that the case?
Thanks!
The text was updated successfully, but these errors were encountered: