Skip to content

LuceneSail fsyncs after each TX, leading to very poor performance #5291

@Ostrzyciel

Description

@Ostrzyciel

Current Behavior

LuceneSail runs the commit() method on the IndexWriter every time a transaction is committed:

/**
* Commits any changes done to the LuceneIndex since the last commit. The semantics is synchronous to
* SailConnection.commit(), i.e. the LuceneIndex should be committed/rolled back whenever the LuceneSailConnection
* is committed/rolled back.
*/
@Override
public synchronized void commit() throws IOException {
getIndexWriter().commit();
// the old IndexReaders/Searchers are not outdated
invalidateReaders();
}

This actually translates to a bit more than a transaction commit on the side of Lucene – it results in a whole lot of fsyncs. This is documented as "This may be a costly operation, so you should test the cost in your application and do it only when really necessary" in Lucene: https://lucene.apache.org/core/7_4_0/core/org/apache/lucene/index/IndexWriter.html#commit--

I came across this while investigating very poor insert performance when using LuceneSail. When any writes are made to Lucene, even super-trivial, performance of the entire app drops by more than tenfold.

Here is the flame graph of CPU time for RDF4J Server running this workload + a few other repositories in parallel:

Image

Expected Behavior

NativeStore has a parameter called forceSync that decides whether the Store should fsync after each transaction commit. It's off by default. I would suggest to add an identical parameter to LuceneSail. My suggestion is to also make it false by default, to save the poor users some hair. I imagine there are relatively few use cases that actually require an fsync to a text index after each commit. But, I'm also fine with it being true by default, this is up to the maintainers, of course.

When forceSync is enabled we would call commit() as done currently. Otherwise, we would call flush() which does the same thing (move written data from the memory to the filesystem), except fsyncing the files.

Steps To Reproduce

Set up a LuceneSail and do some writes to it, with triples involving literals.

Version

latest main (5.3.0?)

Are you interested in contributing a solution yourself?

Yes

Anything else?

I will submit a PR to resolve it.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions