Skip to content

Conversation

@Ostrzyciel
Copy link
Collaborator

This switches the RDF4J Docker image to a recent version that brings massive performance improvements in queries (especially for concurrent workloads). I've also applied some yet-unmerged patches on top:

@Ostrzyciel Ostrzyciel requested a review from tkuhn December 12, 2025 12:45
@tkuhn
Copy link
Contributor

tkuhn commented Dec 12, 2025

Thanks, looking really good!

I did some tests and seems to work nicely overall.

There is only a weird behavior I observed with this test I wrote for the deadlock issue a while back: https://github.com/tkuhn/rdf4j-timeout-test

Following the instructions now both queries (query1 and query2) work, and are fast. So this is good.

But with both of them, I noticed that their results seem to terminate early. query1 should go up to line 29 but it ends at 14, and query2 should go up to line 9,9,9,999 but it stops at 0,2,9,29 (or in some cases at 0,1,4,14). Any idea what's going on here?

I was suspecting there is some issue in these tests and not in RDF4J, but I couldn't find anything...

@Ostrzyciel
Copy link
Collaborator Author

But with both of them, I noticed that their results seem to terminate early. query1 should go up to line 29 but it ends at 14, and query2 should go up to line 9,9,9,999 but it stops at 0,2,9,29 (or in some cases at 0,1,4,14). Any idea what's going on here?

I was suspecting there is some issue in these tests and not in RDF4J, but I couldn't find anything...

Oh! Good catch, I haven't seen that. I will investigate that in detail then. Marking this as draft until I find the root cause.

@Ostrzyciel Ostrzyciel marked this pull request as draft December 12, 2025 15:17
@tkuhn
Copy link
Contributor

tkuhn commented Dec 12, 2025

Great, thanks!

@Ostrzyciel Ostrzyciel marked this pull request as ready for review December 12, 2025 21:52
@Ostrzyciel
Copy link
Collaborator Author

@tkuhn apologies, I should have tested that better... in any case, now the issue is solved. Queries should now complete without deadlocks and return complete results.

This was making RDF4J use 40–50% of a CPU core on idle. I will also optimize this on RDF4J's side, but this already slashes the idle CPU overhead by 3x.
@Ostrzyciel
Copy link
Collaborator Author

While testing, we found that Query's RDF4J instance would use 40–50% of a CPU core on idle, way too much. Turns out that the reason were the calls to get the list of all repos, made in metrics code. I reduced this overhead by 3x with a simple fix, but retrieving this list is unusually slow anyway. I will see what can I do to speed it up in RDF4J.

@Ostrzyciel
Copy link
Collaborator Author

While testing, we found that Query's RDF4J instance would use 40–50% of a CPU core on idle, way too much. Turns out that the reason were the calls to get the list of all repos, made in metrics code. I reduced this overhead by 3x with a simple fix, but retrieving this list is unusually slow anyway. I will see what can I do to speed it up in RDF4J.

I had a look, it would require a pretty big refactor... but thankfully we don't need to check the list of repos in full every time. The list is mostly static and the Query service itself is the only one that can modify it. So I added a simple caching mechanism to request the list only when needed. This solves the issue entirely.

@Ostrzyciel
Copy link
Collaborator Author

Applying this last patch led to a sharp drop in CPU usage on the test cluster:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants