Skip to content

[Question] Anchoring multiple times #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pat266 opened this issue Mar 31, 2021 · 5 comments
Closed

[Question] Anchoring multiple times #48

pat266 opened this issue Mar 31, 2021 · 5 comments

Comments

@pat266
Copy link

pat266 commented Mar 31, 2021

In the example from the readme file, there are 3 different anchoring strategies. I'm interested in 2 of them, Anchoring single sets of words to multiple topics and Anchoring different sets of words to multiple topics. I'm wondering if I should combine two of the strategies together (or more) to get a better result. For example, using the example from the ReadMe file:

Anchor the specific list of words for every individual document

topic_model.fit(X, words=words, anchors=[['bernese', 'mountain', 'dog'], ['mountain', 'rocky', 'colorado']], anchor_strength=2)

Anchor general words throughout all of the documents

topic_model.fit(X, words=words, anchors=['protest', 'protest', 'protest', 'riot', 'riot', 'riot'], anchor_strength=2)

Will fitting the model with two different anchor words lists improve the result in general (or change anything at all), or will it decrease the quality of the result?

Also, does repeating the words in the anchor_words list change how the model view the words (increase its strength)? In the second code, the words 'protest' and 'riot' are repeated thrice.

@ryanjgallagher
Copy link
Collaborator

The use of anchor words and whether it increases the quality of your results depends on your data and what kind of results you are looking for. If you have two sets of anchor words that it makes sense to anchor because you want topics around those words / know there is a good reason that such topics should exist, then I would suggest trying it and comparing the results to those without the anchor words. If you're interested in the topics themselves, then you'll want to investigate what topics appear and don't appear with and without the anchor words. If you're using the topics as input features to some other model, then you'll want to see how that affects the quantitative output.

How much the results will change depends on how high you sent the anchor_strength. The anchor strength is how much weight to assign to the anchor words relative to all the other words. So for example an anchor_strength=2 means to give twice the weight to the anchor words compared to other words.

Yes, repeating the words in that example makes a difference. In that example protest is anchored to topics 1, 2, and 3, while riot is anchored to topics 4, 5, and 6. The model will find different topics for each of those. If you wanted to anchor multiple sets of words multiple times then you'd do something like

anchors=[['mountain', 'dog'], ['mountain', 'dog'], ['rocky', 'mountain'], ['rocky', 'mountain']]

@pat266
Copy link
Author

pat266 commented Apr 2, 2021

@ryanjgallagher If I have 4 topics and I only have 3 anchor words lists, can I leave the 4th anchor words list empty, or will it mess up the algorithm?

For example:

anchors=[['mountain', 'dog'], [''], ['rocky', 'mountain'], ['rocky', 'mountain']]

@ryanjgallagher
Copy link
Collaborator

You should just put lists for the topics you want to anchor. You shouldn't put anything for topics you don't anchor. So it would be like

anchors=[['mountain', 'dog'], ['rocky', 'mountain'], ['rocky', 'mountain']]

@pat266
Copy link
Author

pat266 commented Apr 2, 2021

@ryanjgallagher I guess a better way to ask the first question is that is there a way to assign different anchor_strength for different sets of words to the same topic? For example, if I have 2 topics total, can I do something like this?

topic_model.fit(X, words=words, anchors=[['bernese', 'mountain', 'dog'], ['mountain', 'rocky', 'colorado']], anchor_strength=4)
topic_model.fit(X, words=words, anchors=[['protest'], ['riot']], anchor_strength=2)

Would this work?

@ryanjgallagher
Copy link
Collaborator

Currently, no, you can't do different anchor weights for different words right now. Pull request #40 proposes to add that feature, but we haven't had the capacity to verify it yet unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants