You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think you brought up a great point. There needs to be some work in improving the hyper-parameter optimization code (takes the weights and assigns b and r for the index, see https://github.com/ekzhu/datasketch/blob/master/datasketch/lsh.py#L22). The current one is both data-agnostic and not tuned toward any specific recall requirement.
I prepare 10 synthetic examples.
Value and query in each pair have Jaccard > 0.9
I insert all values in MinHashLSH, use default settings. For every query expect exactly one value. But in 4 / 10 cases get no results.
I change weights and get correct results
Is it an expected behavior? Maybe change default threshold or weights?
The text was updated successfully, but these errors were encountered: