-
Notifications
You must be signed in to change notification settings - Fork 681
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: reproducible results #65
Comments
Hi @ivan-marroquin, the results would be reproducible if using only one thread during the construction. |
Hi @yurymalkov thanks for the clarification. Do you think that it is possible to ensure reproducible results when using several CPUs? It will be a great addition to a great package! Thanks for all, Ivan |
@ivan-marroquin it is possible, but it is complicated. It also prevents incremental one-vector updates :you will likely need to process everything in batches. There is also no obvious benefit to this. If indexing parameters M and efConstruction are sufficiently large, usually what we see are very small variations in performance. |
Hi @searchivarius , I will try using multiple batches. Since my input file has N >>> D (N: observations, D: features), I have to use M = 5 and efConstruction= 10 to deal with this large amount of data. Is it possible that I will get relative large variations? |
Hi @ivan-marroquin it is not impossible. But it is not impossible that the method will not work well at all. I have seen some data sets where this happened. HNSW is relatively robust, but there are no guarantees. It is best to build the index and to test. BTW, for large amounts of data, you can easily trade-off some efficiency for performance if you index in chunks. Say, index a chunk not larger than 10M records. Then, combine results. If you have say 1B records, it can be an order of magnitude in terms of indexing time, so you would be able to trade it off for increased accuracy (i.e., by setting larger M and efConstruction). |
Hi,
Is there a way to fix a seed in order to get reproducible results?
Many thanks,
Ivan
The text was updated successfully, but these errors were encountered: