-
Notifications
You must be signed in to change notification settings - Fork 593
HDDS-13906. Reduce Bootstrap Write lock time on OM during bootstrapping execution. #9585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the patch @sadanand48 . Have a few doubts with the implementation
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBArchiver.java
Outdated
Show resolved
Hide resolved
...ne-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServletInodeBasedXfer.java
Show resolved
Hide resolved
|
@rnblough Do you wanna take a look at this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR reduces the Bootstrap write lock time during OM bootstrapping by refactoring the checkpoint streaming process. The key change is separating the file collection phase (which creates hardlinks while holding the lock) from the actual tar streaming phase (which now happens outside the lock).
Changes:
- Introduced
OMDBArchiverclass to manage hardlink creation and deferred tar writing - Refactored
OMDBCheckpointServletInodeBasedXferto collect files under lock and write outside lock - Updated tests to work with the new two-phase approach
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 15 comments.
| File | Description |
|---|---|
| hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBArchiver.java | New class that manages hardlink creation during lock phase and tar writing outside lock |
| hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServletInodeBasedXfer.java | Refactored to use two-phase approach: collectDbDataToTransfer (under lock) and writeToArchive (outside lock) |
| hadoop-ozone/ozone-manager/src/test/java/org/apache/hadoop/ozone/om/TestOMDBArchiver.java | New test class covering OMDBArchiver functionality |
| hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/om/TestOMDbCheckpointServletInodeBasedXfer.java | Updated integration tests to work with new method signatures and two-phase approach |
Comments suppressed due to low confidence (1)
hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMDBCheckpointServletInodeBasedXfer.java:192
- Potential resource leak: If an IOException occurs in the second try block (lines 173-182) after successfully creating the OMDBArchiver and collecting data, the hardlinks created in the tmpDir may not be cleaned up properly. The cleanup in the finally block only deletes the tmpDir, but if writeToArchive fails, the hardlink files stored in filesToWriteIntoTarball map won't be explicitly deleted since the cleanup happens in writeToArchive's finally block. Consider adding explicit cleanup of the archiver's resources in the outer finally block.
try {
Instant start = Instant.now();
OutputStream outputStream = response.getOutputStream();
omdbArchiver.writeToArchive(getConf(), outputStream);
Instant end = Instant.now();
long duration = Duration.between(start, end).toMillis();
LOG.info("Time taken to write the checkpoint to response output " +
"stream: {} milliseconds", duration);
} catch (IOException e) {
LOG.error("unable to write to archive stream", e);
} finally {
try {
if (tmpdir != null) {
FileUtils.deleteDirectory(tmpdir.toFile());
}
} catch (IOException e) {
LOG.error("unable to delete: " + tmpdir, e.toString());
}
}
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
What changes were proposed in this pull request?
Because of fixes in HDDS-13905 an entire BG service task would be blocked because of bootstrap operation running on a leader OM. One possible improvement here would be to just to create a hardlink(which we already do to ensure this file doesn't get deleted by rocksdb operations but we also write the file into the Tarball stream synchronously
ozone/hadoop-hdds/framework/src/main/java/org/apache/hadoop/hdds/utils/Archiver.java
Lines 135 to 155 in 96390ac
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-13906
How was this patch tested?
Will add tests