-
Notifications
You must be signed in to change notification settings - Fork 4
Update RDF4J, speed up nanopub indexing #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Update RDF4J, speed up nanopub indexing #53
Conversation
This requires this PR to be merged: eclipse-rdf4j/rdf4j#5446
|
Thanks, looking really good! I did some tests and seems to work nicely overall. There is only a weird behavior I observed with this test I wrote for the deadlock issue a while back: https://github.com/tkuhn/rdf4j-timeout-test Following the instructions now both queries (query1 and query2) work, and are fast. So this is good. But with both of them, I noticed that their results seem to terminate early. query1 should go up to line I was suspecting there is some issue in these tests and not in RDF4J, but I couldn't find anything... |
Oh! Good catch, I haven't seen that. I will investigate that in detail then. Marking this as draft until I find the root cause. |
|
Great, thanks! |
|
@tkuhn apologies, I should have tested that better... in any case, now the issue is solved. Queries should now complete without deadlocks and return complete results. |
This was making RDF4J use 40–50% of a CPU core on idle. I will also optimize this on RDF4J's side, but this already slashes the idle CPU overhead by 3x.
|
While testing, we found that Query's RDF4J instance would use 40–50% of a CPU core on idle, way too much. Turns out that the reason were the calls to get the list of all repos, made in metrics code. I reduced this overhead by 3x with a simple fix, but retrieving this list is unusually slow anyway. I will see what can I do to speed it up in RDF4J. |
I had a look, it would require a pretty big refactor... but thankfully we don't need to check the list of repos in full every time. The list is mostly static and the Query service itself is the only one that can modify it. So I added a simple caching mechanism to request the list only when needed. This solves the issue entirely. |

This switches the RDF4J Docker image to a recent version that brings massive performance improvements in queries (especially for concurrent workloads). I've also applied some yet-unmerged patches on top: