Not being able to replicate coherence scores from paper #13

NadineH1990 · 2017-05-09T15:29:48Z

Dear all,

For my research I want to evaluate a new semantic coherence measure with the ones available in Palmetto, especially C_V and C_A. I'm trying to replicate some results described in your paper ("Exploring the Space of Topic Coherence Measures"), by using the topics and human ratings that you have published. However, I'm not able to replicate the same Pearson correlation scores as in Table 2. Having a closer look, I found that I'm also not able to replicate the coherence scores as shown in Table 8 of the paper. In this table the following coherence scores, using the measure C_V, are displayed:

0.94 company sell corporation own acquire purchase buy business sale owner
0.91 age population household female family census live average median income
0.86 jewish israel jew israeli jerusalem rabbi hebrew palestinian palestine holocaust

Running Palmetto as jar and using the wikipedia_bd downloaded from the link on Github and the above topics in a .txt file, I get the following scores:

0.52072 company sell corporation own acquire purchase buy business sale owner
0.75174 age population household female family census live average median income
0.73356 jewish israel jew israeli jerusalem rabbi hebrew palestinian palestine holocaust

Using the web service I also get different scores:

http://palmetto.aksw.org/palmetto-webapp/service/cv?words=company%20sell%20corporation%20own%20acquire%20purchase%20buy%20business%20sale%20owner

http://palmetto.aksw.org/palmetto-webapp/service/cv?words=age%20population%20household%20female%20family%20census%20live%20average%20median%20income

http://palmetto.aksw.org/palmetto-webapp/service/cv?words=jewish%20israel%20jew%20israeli%20jerusalem%20rabbi%20hebrew%20palestinian%20palestine%20holocaust

Am I making a mistake somewhere? How can the scores here be different from the C_V scores displayed in the paper?

MichaelRoeder · 2017-05-11T11:33:50Z

Thanks for using Palmetto and pointing out this important difference.
Sorry, that I can not give a direct answer right now but I hope that I will have some time during the weekend to search for the reason of this difference.

NadineH1990 · 2017-05-15T12:28:47Z

Great, I will be patiently waiting! :)

MichaelRoeder · 2017-05-21T13:35:23Z

I can confirm, that I am encountering the same problem. I couldn't replicate the results from table 8, too. I also tried to reproduce the correlation values for the NYT topics, the C_V coherence and Wikipedia as reference coprus, but I got 0.781 instead of 0.803. Until now, I couldn't find a reason why the results are different.

Sorry, that I couldn't bring much more light into this problem, until now.

On which OS are you executing the problem?

NadineH1990 · 2017-05-22T08:45:22Z

I'm using Windows 7 (64 bit). I got a much lower coherence when trying to reproduce the correlation values for the NYT topics. But probably I did something wrong there myself.

MichaelRoeder · 2017-05-22T08:57:16Z

Feel free to reuse/check my correlation implementations. You can find them at
https://github.com/AKSW/Palmetto/tree/master/palmetto/src/main/java/org/aksw/palmetto/evaluate/correlation

ghost · 2018-03-05T11:07:34Z

Are there any more information available on what is causing the difference between values in the paper and the ones calculated using the provided source.

I get similar but not exactly the same values as the ones provided in the original issue. This is using the wikipedia index provided at http://139.18.2.164/mroeder/palmetto/ and compiling the library locally as a jar. (I get these values using the provided jar aswell) I am on a Linux system with Fedora 27. The values i get are:
0.51207 company, sell, corporation, own, acquire, purchase, buy, business, sale, owner
0.75174 age, population, household, female, family, census, live, average, median, income
0.73356 jewish, israel, jew, israeli, jerusalem, rabbi, hebrew, palestinian, palestine, holocaust
The values are the same as the ones returned by the web service, see the original issue.

I also checkout out and older version aa8b650 and compiled that version. Using this version I also got the exact same values as current version.

Is there a known commit of the library the returns the same values as the paper? That would greatly help in troubleshooting what is causing these differences.

Or could the difference depend on the Wikipedia_db version, but that version provided is dated May 2014. So it should not have changed right?

MichaelRoeder · 2018-03-08T17:04:46Z

Sorry, I still couldn't figure out where the difference comes from. The implementation itself does not seem to cause the problem. I also made sure that for the examples posted above there is no influence from a lemmatizer in the preprocessing.

So there are two sources left:

the wikipedia index. But as far as I remember, the index that is online should be the index that we used for the calculations. I know that we exchanged the index during the experiments. The old index was slightly larger because it contained the content of tables. However, we decided to remove tables from the wikipedia documents because the words in the single cells might not have a direct relation to each other. However, I think that was before calculating the final numbers and thus it shouldn't have any influence.
for the calculation of the numbers, Palmetto has been used as a library for a small program, that is not part of the github repository. This program was simply more optimized regarding the empirical exploration of the space of possible coherences, i.e., it tried to reuse intermediate results to reduce the overall runtime. It might be possible that this program created a side effect that influenced the final result. Although I am not convinced that this happened, it might be possible and something that I can check again.

In general, it seems like C_V has also a drawback described in #12. It behaves not very good when it is used for randomly generated word sets.

So finally I would suggest to use C_P, NPMI or UCI for evaluating topics.

Lehas-sudo · 2023-06-23T08:48:33Z

So is the stande we should not use Cv to evaluate lda topic models? or just if the corpus size is small?

MichaelRoeder · 2023-06-23T08:58:16Z

The main issue with respect to our implementation is fixed with (#81). It turned out that a parameter was implement wrong and, hence, C_V should a strange behavior. After the fix, tests showed that C_V works as it should, i.e., although the exact values described in this issue are still not being reproduced, the Pearson correlation values of C_V fit to the values reported in the paper.

Does that answer your question?

MichaelRoeder self-assigned this May 11, 2017

MichaelRoeder mentioned this issue Apr 17, 2018

Mismatch between expected coherence and the one obtained dice-group/palmetto-py#2

Closed

benreaves mentioned this issue Feb 9, 2022

Question: Calculating Coherence. What words are expected as Targets? bab2min/tomotopy#121

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not being able to replicate coherence scores from paper #13

Not being able to replicate coherence scores from paper #13

NadineH1990 commented May 9, 2017

MichaelRoeder commented May 11, 2017

NadineH1990 commented May 15, 2017

MichaelRoeder commented May 21, 2017 •

edited

Loading

NadineH1990 commented May 22, 2017

MichaelRoeder commented May 22, 2017

ghost commented Mar 5, 2018

MichaelRoeder commented Mar 8, 2018

Lehas-sudo commented Jun 23, 2023

MichaelRoeder commented Jun 23, 2023

Not being able to replicate coherence scores from paper #13

Not being able to replicate coherence scores from paper #13

Comments

NadineH1990 commented May 9, 2017

MichaelRoeder commented May 11, 2017

NadineH1990 commented May 15, 2017

MichaelRoeder commented May 21, 2017 • edited Loading

NadineH1990 commented May 22, 2017

MichaelRoeder commented May 22, 2017

ghost commented Mar 5, 2018

MichaelRoeder commented Mar 8, 2018

Lehas-sudo commented Jun 23, 2023

MichaelRoeder commented Jun 23, 2023

MichaelRoeder commented May 21, 2017 •

edited

Loading