-
Notifications
You must be signed in to change notification settings - Fork 175
Description
Current Behavior
LuceneSail runs the commit()
method on the IndexWriter
every time a transaction is committed:
rdf4j/core/sail/lucene/src/main/java/org/eclipse/rdf4j/sail/lucene/impl/LuceneIndex.java
Lines 704 to 714 in 782ffd8
/** | |
* Commits any changes done to the LuceneIndex since the last commit. The semantics is synchronous to | |
* SailConnection.commit(), i.e. the LuceneIndex should be committed/rolled back whenever the LuceneSailConnection | |
* is committed/rolled back. | |
*/ | |
@Override | |
public synchronized void commit() throws IOException { | |
getIndexWriter().commit(); | |
// the old IndexReaders/Searchers are not outdated | |
invalidateReaders(); | |
} |
This actually translates to a bit more than a transaction commit on the side of Lucene – it results in a whole lot of fsyncs. This is documented as "This may be a costly operation, so you should test the cost in your application and do it only when really necessary" in Lucene: https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/index/IndexWriter.html#commit--
I came across this while investigating very poor insert performance when using LuceneSail
. When any writes are made to Lucene, even super-trivial, performance of the entire app drops by more than tenfold.
Here is the flame graph of CPU time for RDF4J Server running this workload + a few other repositories in parallel:
Expected Behavior
NativeStore
has a parameter called forceSync
that decides whether the Store should fsync after each transaction commit. It's off by default. I would suggest to add an identical parameter to LuceneSail
. My suggestion is to also make it false by default, to save the poor users some hair. I imagine there are relatively few use cases that actually require an fsync to a text index after each commit. But, I'm also fine with it being true by default, this is up to the maintainers, of course.
When forceSync
is enabled we would call commit()
as done currently. Otherwise, we would call flush()
which does the same thing (move written data from the memory to the filesystem), except fsyncing the files.
Steps To Reproduce
Set up a LuceneSail
and do some writes to it, with triples involving literals.
Version
latest main (5.3.0?)
Are you interested in contributing a solution yourself?
Yes
Anything else?
I will submit a PR to resolve it.