-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc's example fails #16
Comments
Thanks for the issue! I can't reproduce this. I'm guessing this is a problem with the DataStructures version compatibility restriction. Could you share which version you have installed? |
Sorry about this, I should have saved/posted the environment in order to reproduce later. With my current environment it works. But it does seem like a missing lower/upper bound compat problem with DataStructures. By the way, I was trying to build some large trees but that took too long. I ended up breaking them into smaller pieces, but in the process I modified the construction algorithm to use threads to parallelize distance computations, as opposed to spawning for children branches. I got a speed-up of 2-3x, so you might consider switching to that approach. I apologize for not offering myself to make a pull request in case you'd be interested; the problem is that I had to do that under time pressure and for the sake of simplicity I just got rid of all the logic that deals with the situation where there are no threads available. |
I'll keep this issue open for now and try to fix the version problem when I get around to it. Thanks for the idea for a performance improvement. I'm guessing it depends on the expense of the distance metric used. Perhaps you could share something I could use as a benchmark? |
Let me ask if I can share (it's a client's dataset). The metric is indeed expensive, custom distance for dataframe rows with some text columns, maybe similar to TokenMax in StringDistances. |
Sorry for the delay. So, I asked and for this particular dataset we are under NDA. However, I think it will not make much of a difference to just use a random string dataset and TokenMax. |
From the docs, one could do:
However, that fails with
MethodError: no method matching iterate(::DataStructures.BinaryMaxHeap{Tuple{Int64, Int64}})
.The text was updated successfully, but these errors were encountered: