-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Breaks immediately on very large problems #7
Comments
I am afraid that even if the lib did not crash, you would have to wait literally ages until you got the result. The time complexity is cubic, even after applying all the algorithmic smart tricks and computational optimizations. 55k^3 is a big number :) Now that you are hopeless, may I ask you two things:
I can debug and tell the exact reason why it did not work. |
Hey, darn that's okay. Thanks anyway. Here's the example. # imaginary data
data = np.empty([55000,2]) from lapjv import lapjv
import numpy as np
def dists(a, b):
return np.linalg.norm(a-b,axis=1)
def mycdist(grid,positions):
"""Because scipy.spatial.distance.cdist was giving memory errors; sadly this version is slower"""
out = np.array([]).astype(np.float16).reshape([0,len(positions)])
tmp = []
for i in range(len(positions)):
tmp.append(np.square(dists(grid,positions[i])).astype(np.float16))
if i%1000 == 0:
print(i//1000, "/", len(positions)//1000)
out = np.concatenate((out,np.nan_to_num(np.array(tmp).astype(np.float16))),axis=0)
del tmp
tmp = []
out = np.concatenate((out,np.nan_to_num(np.array(tmp).astype(np.float16))),axis=0)
del tmp
#out[i] = np.square(dists(grid,positions[0])).astype(np.float16)
return out
n_x, n_y = (200, 275) # two numbers that multiply together to give 55000
grid = np.dstack(np.meshgrid(np.linspace(0, 1,n_y), np.linspace(0, 1, n_x))).reshape(-1, 2)
cost_matrix = mycdist(grid, data)
cost_matrix = cost_matrix * (1000 / cost_matrix.max())
row_asses, col_asses, _ = lapjv(cost_matrix) Unfortunately there's no error message through jupyter, just a popup that says the kernel died. |
Hold on, I think there was a separate problem where lapjv wouldn't accept float16s but even using the standard scipy cdist function I think there's an error. from lapjv import lapjv
import numpy as np
from scipy.spatial.distance import cdist
data = np.empty([55000,2])
n_x, n_y = (200, 275) # two numbers that multiply together to give 55000
grid = np.dstack(np.meshgrid(np.linspace(0, 1,n_y), np.linspace(0, 1, n_x))).reshape(-1, 2)
cost_matrix = cdist(grid[0:40000], data[0:40000])
cost_matrix = cost_matrix * (1000 / cost_matrix.max())
row_asses, col_asses, _ = lapjv(cost_matrix) I'll need to boot the digital ocean box back up to get a stack trace. |
Hey great library!
I'm curious about why this function crashes on large assignment problems.
It worked for me for 5000 and 10,000 position problems but crashed for 55,000 immediately.
At first I thought it was a memory issue but it seems to be crashing long before exhausting all the memory on the machines I tried.
Do you have any idea why?
The text was updated successfully, but these errors were encountered: