-
Notifications
You must be signed in to change notification settings - Fork 120
[Question] Anchoring multiple times #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The use of anchor words and whether it increases the quality of your results depends on your data and what kind of results you are looking for. If you have two sets of anchor words that it makes sense to anchor because you want topics around those words / know there is a good reason that such topics should exist, then I would suggest trying it and comparing the results to those without the anchor words. If you're interested in the topics themselves, then you'll want to investigate what topics appear and don't appear with and without the anchor words. If you're using the topics as input features to some other model, then you'll want to see how that affects the quantitative output. How much the results will change depends on how high you sent the Yes, repeating the words in that example makes a difference. In that example
|
@ryanjgallagher If I have 4 topics and I only have 3 anchor words lists, can I leave the 4th anchor words list empty, or will it mess up the algorithm? For example:
|
You should just put lists for the topics you want to anchor. You shouldn't put anything for topics you don't anchor. So it would be like
|
@ryanjgallagher I guess a better way to ask the first question is that is there a way to assign different anchor_strength for different sets of words to the same topic? For example, if I have 2 topics total, can I do something like this?
Would this work? |
Currently, no, you can't do different anchor weights for different words right now. Pull request #40 proposes to add that feature, but we haven't had the capacity to verify it yet unfortunately. |
In the example from the readme file, there are 3 different anchoring strategies. I'm interested in 2 of them, Anchoring single sets of words to multiple topics and Anchoring different sets of words to multiple topics. I'm wondering if I should combine two of the strategies together (or more) to get a better result. For example, using the example from the ReadMe file:
Anchor the specific list of words for every individual document
topic_model.fit(X, words=words, anchors=[['bernese', 'mountain', 'dog'], ['mountain', 'rocky', 'colorado']], anchor_strength=2)
Anchor general words throughout all of the documents
topic_model.fit(X, words=words, anchors=['protest', 'protest', 'protest', 'riot', 'riot', 'riot'], anchor_strength=2)
Will fitting the model with two different anchor words lists improve the result in general (or change anything at all), or will it decrease the quality of the result?
Also, does repeating the words in the anchor_words list change how the model view the words (increase its strength)? In the second code, the words 'protest' and 'riot' are repeated thrice.
The text was updated successfully, but these errors were encountered: