-
Notifications
You must be signed in to change notification settings - Fork 193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock on parallel call to spatial.addNodes #355
Comments
The original spatial library was written within the context of low concurrency embedded applications. This means that several parts including the RTree are not thread safe. It is not recommended to run parallel bulk imports into the RTree. The particular issue you are seeing is likely related to the way the total counts are maintained, which is not a good design and something we would like to fix, but even once fixed, the overall lack of thread safety in the RTree will remain and the risks with parallel imports remains, and would need to be addressed. If you are only importing Point data, you could use a different index, hilbert curve or geohash over lucene. However lucene is known to perform badly for concurrent reads and writes, so you could face a different set of performance problems, depending on your usage scenario. If you work with points, the best option by far would be to use Neo4j's built-in spatial index only, and avoid this libraries indexing. If you have points in one layer and complex geometries like polygons in another, you could actually use the native Neo4j point index for the points, and the spatial library for the polygons. The main consequence would be that you would have two quite different spatial models in place, but it could be an option to avoid the concurrency problems if the high volume data are the points. |
Hi Craig! I do already use the native point index, and also the brand new NativePointEncoder to reference them with complex polygons. So what I’m hearing is, parallel execution is unsupported and can’t be? Any thoughts on why things didn’t get complete solved by holding the apoc lock in the spatial root? Perhaps that doesn’t behave quite like I expect? Perhaps related; When I moved my Neo4j storage from a traditional HDD to a M.2 SSD (so the write speed increased by a factor of about 10x) I noticed the startup on my app on a fresh database started to sometimes fail on the second call to spatial.addLayerWithEncoder. They happen one after the other, not at the same time, and my “solution” was to add a sleep of one second in between them. What happened after the error was there was more than one ReferenceNode with name “spatial_root”. Perhaps there’s something not holding a file system lock properly? |
If two separate queries that end up doing an addNodes call, I get this:
Neo4jError: Failed to invoke procedure
spatial.addNodes: Caused by: org.neo4j.kernel.DeadlockDetectedException: ForsetiClient[1] can't acquire ExclusiveLock{owner=ForsetiClient[2]} on NODE(200100), because holders of that lock are waiting for ForsetiClient[1]. Wait list:ExclusiveLock[Client[2] waits for [1]]
That's the node that has the
RTREE_METADATA
to the layer node:I altered the query to hold a lock the spatial_root ReferenceNode;
I see the deadlock exception much less as a result, but still see it sometimes. Parallel processing is important for doing very large imports into the graph.
Any thoughts? Thanks,
The text was updated successfully, but these errors were encountered: