Skip to content

Conversation

Ostrzyciel
Copy link
Contributor

GitHub issue resolved: #5291

Briefly describe the changes proposed in this PR:

As described in the issue, the current setup with an fsync after each transaction is very safe, but also a huge bottleneck when dealing with many small transactions. This PR introduces an option that allows for asynchronous fsyncs in the background, on a fixed interval. If there is nothing to sync, it does nothing.

I tested this with the original workload with which I found the issue. When I set fsyncInterval to 5000 ms, it went from ~10–12 TX/s to ~100–150 TX/s, over an HTTP connection. That's basically the same as when I tried removing the fsync entirely. Great :)


PR Author Checklist (see the contributor guidelines for more details):

  • my pull request is self-contained
  • I've added tests for the changes I made
  • I've applied code formatting (you can use mvn process-resources to format from the command line)
  • I've squashed my commits where necessary
  • every commit message starts with the issue number (GH-xxxx) followed by a meaningful description of the change

Ostrzyciel added a commit to Ostrzyciel/nanopub-query that referenced this pull request Sep 22, 2025
@hmottestad
Copy link
Contributor

I don't think that the close() method on the directory in the LuceneIndex class is ever called.

@Ostrzyciel
Copy link
Contributor Author

I don't think that the close() method on the directory in the LuceneIndex class is ever called.

@hmottestad oops, you are right! Fixed that and added a test to make sure it happens.

try {
super.syncMetaData();
} catch (IOException e) {
logger.error("IO error during a periodic sync of Lucene index metadata", e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried that if for some reason there is a persistent issue, then we may end up logging continuously but never actually throwing an exception.

What would usually happen if an IO exception was thrown (with the original code)? Would it bring down the entire application or just a particular transaction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would result in a transaction rollback:

luceneIndex.commit();
} catch (IOException | SailException e) {
logger.error("Rolling back", e);
luceneIndex.rollback();

We cannot do the same thing 1:1 with asynchronous fsyncs, because we don't wait for the result of the fsync. The next best thing we can do is to throw an exception on the next transaction.

I've added a bit of code for that, along with a test.

@hmottestad
Copy link
Contributor

Thanks for the good fix. I think this will be a good solution overall, just some small things I want to be sure are robust.

@Override
public void sync(Collection<String> names) throws IOException {
synchronized (pendingSyncs) {
pendingSyncs.addAll(names);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big is this likely to grow? Should we have a hard limit (possibly configurable) so that we don't run out of memory before we sync?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell, there is no limit on this, it depends on Lucene index size. I added a configurable limit for this, set to 5000 files by default – should be good enough. There is also a test for this.

@hmottestad hmottestad changed the base branch from main to develop October 3, 2025 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LuceneSail fsyncs after each TX, leading to very poor performance

2 participants