-
Notifications
You must be signed in to change notification settings - Fork 594
HDDS-13906. Reduce Bootstrap Write lock time on OM during bootstrapping execution. #9585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 2 commits
ae2cd6a
4dc3385
07ce63f
5c8f0b1
1919a29
824b6c2
fb48f48
e7c9441
56bdbda
5548eca
ee23af3
eadbe81
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,130 @@ | ||||||||||||||||||||||||||||||||||||||
| /* | ||||||||||||||||||||||||||||||||||||||
| * Licensed to the Apache Software Foundation (ASF) under one or more | ||||||||||||||||||||||||||||||||||||||
| * contributor license agreements. See the NOTICE file distributed with | ||||||||||||||||||||||||||||||||||||||
| * this work for additional information regarding copyright ownership. | ||||||||||||||||||||||||||||||||||||||
| * The ASF licenses this file to You under the Apache License, Version 2.0 | ||||||||||||||||||||||||||||||||||||||
| * (the "License"); you may not use this file except in compliance with | ||||||||||||||||||||||||||||||||||||||
| * the License. You may obtain a copy of the License at | ||||||||||||||||||||||||||||||||||||||
| * | ||||||||||||||||||||||||||||||||||||||
| * http://www.apache.org/licenses/LICENSE-2.0 | ||||||||||||||||||||||||||||||||||||||
| * | ||||||||||||||||||||||||||||||||||||||
| * Unless required by applicable law or agreed to in writing, software | ||||||||||||||||||||||||||||||||||||||
| * distributed under the License is distributed on an "AS IS" BASIS, | ||||||||||||||||||||||||||||||||||||||
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||||||||||||||||||||||||||||||||||||||
| * See the License for the specific language governing permissions and | ||||||||||||||||||||||||||||||||||||||
| * limitations under the License. | ||||||||||||||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| package org.apache.hadoop.ozone.om; | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| import static org.apache.hadoop.hdds.utils.Archiver.includeFile; | ||||||||||||||||||||||||||||||||||||||
| import static org.apache.hadoop.hdds.utils.Archiver.tar; | ||||||||||||||||||||||||||||||||||||||
| import static org.apache.hadoop.hdds.utils.HddsServerUtil.includeRatisSnapshotCompleteFlag; | ||||||||||||||||||||||||||||||||||||||
| import static org.apache.hadoop.ozone.om.OMDBCheckpointServletInodeBasedXfer.writeHardlinkFile; | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| import java.io.File; | ||||||||||||||||||||||||||||||||||||||
| import java.io.IOException; | ||||||||||||||||||||||||||||||||||||||
| import java.io.OutputStream; | ||||||||||||||||||||||||||||||||||||||
| import java.nio.file.Files; | ||||||||||||||||||||||||||||||||||||||
| import java.nio.file.Path; | ||||||||||||||||||||||||||||||||||||||
| import java.util.HashMap; | ||||||||||||||||||||||||||||||||||||||
| import java.util.Map; | ||||||||||||||||||||||||||||||||||||||
| import org.apache.commons.compress.archivers.ArchiveOutputStream; | ||||||||||||||||||||||||||||||||||||||
| import org.apache.commons.compress.archivers.tar.TarArchiveEntry; | ||||||||||||||||||||||||||||||||||||||
| import org.apache.hadoop.hdds.conf.OzoneConfiguration; | ||||||||||||||||||||||||||||||||||||||
| import org.apache.hadoop.util.Time; | ||||||||||||||||||||||||||||||||||||||
| import org.slf4j.Logger; | ||||||||||||||||||||||||||||||||||||||
| import org.slf4j.LoggerFactory; | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||||||||||||||
| * Class for handling operations relevant to archiving the OM DB tarball. | ||||||||||||||||||||||||||||||||||||||
| * Mainly maintains a map for recording the files collected from reading | ||||||||||||||||||||||||||||||||||||||
| * the checkpoint and snapshot DB's. It temporarily creates hardlinks and stores | ||||||||||||||||||||||||||||||||||||||
| * the link data in the map to release the bootstrap lock quickly | ||||||||||||||||||||||||||||||||||||||
| * and do the actual write at the end outside the lock. | ||||||||||||||||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||||||||||||||||
| public class OMDBArchiver { | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| private Path tmpDir; | ||||||||||||||||||||||||||||||||||||||
| private Map<String, File> filesToWriteIntoTarball; | ||||||||||||||||||||||||||||||||||||||
| private Map<String, String> hardLinkFileMap; | ||||||||||||||||||||||||||||||||||||||
| private static final Logger LOG = LoggerFactory.getLogger(OMDBArchiver.class); | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| public OMDBArchiver() { | ||||||||||||||||||||||||||||||||||||||
| this.tmpDir = null; | ||||||||||||||||||||||||||||||||||||||
| this.filesToWriteIntoTarball = new HashMap<>(); | ||||||||||||||||||||||||||||||||||||||
| hardLinkFileMap = null; | ||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| public void setTmpDir(Path tmpDir) { | ||||||||||||||||||||||||||||||||||||||
| this.tmpDir = tmpDir; | ||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| public Map<String, String> getHardLinkFileMap() { | ||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||
| return hardLinkFileMap; | ||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| public void setHardLinkFileMap(Map<String, String> hardLinkFileMap) { | ||||||||||||||||||||||||||||||||||||||
| this.hardLinkFileMap = hardLinkFileMap; | ||||||||||||||||||||||||||||||||||||||
| } | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||||||||||||||||
| * @param file the file to create a hardlink and record into the map | ||||||||||||||||||||||||||||||||||||||
| * @param entryName name of the entry corresponding to file | ||||||||||||||||||||||||||||||||||||||
| * @return the file size | ||||||||||||||||||||||||||||||||||||||
| * @throws IOException in case of hardlink failure | ||||||||||||||||||||||||||||||||||||||
| * | ||||||||||||||||||||||||||||||||||||||
| * Records the given file entry into the map after taking a hardlink. | ||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+90
to
+95
|
||||||||||||||||||||||||||||||||||||||
| * @param file the file to create a hardlink and record into the map | |
| * @param entryName name of the entry corresponding to file | |
| * @return the file size | |
| * @throws IOException in case of hardlink failure | |
| * | |
| * Records the given file entry into the map after taking a hardlink. | |
| * Records the given file entry into the map after taking a hardlink. | |
| * | |
| * @param file the file to create a hardlink and record into the map | |
| * @param entryName name of the entry corresponding to file | |
| * @return the file size | |
| * @throws IOException in case of hardlink failure |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing null check for tmpDir: The recordFileEntry method uses tmpDir.resolve() on line 98 without verifying that tmpDir has been set. If recordFileEntry is called before setTmpDir, this will result in a NullPointerException. Consider adding validation to ensure tmpDir is not null before using it.
| public long recordFileEntry(File file, String entryName) throws IOException { | |
| public long recordFileEntry(File file, String entryName) throws IOException { | |
| if (tmpDir == null) { | |
| throw new IllegalStateException( | |
| "Temporary directory not set. Call setTmpDir() before recordFileEntry()."); | |
| } |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential file name collision in hardlink creation: The method creates a hardlink using only the entryName without considering potential collisions. If recordFileEntry is called multiple times with the same entryName but different files, the second call will fail because Files.createLink will throw FileAlreadyExistsException. The existing hardlink should either be checked and handled, or the code should ensure unique entryNames are passed.
| File link = tmpDir.resolve(entryName).toFile(); | |
| long bytes = 0; | |
| try { | |
| Files.createLink(link.toPath(), file.toPath()); | |
| Path linkPath = tmpDir.resolve(entryName); | |
| File link = linkPath.toFile(); | |
| long bytes = 0; | |
| try { | |
| if (Files.exists(linkPath)) { | |
| // If the existing file is already a link to the same source, just reuse it. | |
| if (Files.isSameFile(linkPath, file.toPath())) { | |
| filesToWriteIntoTarball.put(entryName, link); | |
| return file.length(); | |
| } | |
| // Otherwise, remove the stale link/entry so we can recreate it. | |
| Files.delete(linkPath); | |
| } | |
| Files.createLink(linkPath, file.toPath()); |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent time method usage. Line 123 uses 'Time.now()' for initialization, but lines 131 and 134 use 'Time.monotonicNow()' for comparison. Since monotonic time should be used for measuring elapsed time intervals, line 123 should also use 'Time.monotonicNow()' to ensure consistent time measurement.
| long lastLoggedTime = Time.now(); | |
| long lastLoggedTime = Time.monotonicNow(); |
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable 'filesWritten' is incremented but never used in the logging statement. Line 133 references 'filesWritten' in the log message, but it should be incremented after line 130 to track the actual number of files written. Currently, it remains at 0 throughout the loop.
Copilot
AI
Jan 20, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message on line 137 is misleading. It states "Couldn't create hardlink for file" but this error occurs during the write phase when including the file in the tarball, not when creating the hardlink. The hardlink was already created earlier in the recordFileEntry method. The error message should accurately describe that the issue is with writing the file to the archive.
| LOG.error("Couldn't create hardlink for file {} while including it in tarball.", | |
| LOG.error("Failed to write file {} to checkpoint tarball archive.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate mock setup: Line 247 sets up a mock for 'writeDbDataToStream' with 5 parameters, but the same method is already mocked on line 236 with the same signature. This is redundant and one of these lines should be removed.