-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[HUDI-8590] fix: wrong file path for consistent-bucket-commit-marker-file #12344
[HUDI-8590] fix: wrong file path for consistent-bucket-commit-marker-file #12344
Conversation
@beyond1920 @danny0405 HI, I find a bug about incorrect path for the Consistent-Bucket-Commit-Marker-File. Please have a look. Thanks! |
efde0e3
to
fb1ea61
Compare
...udi-client-common/src/main/java/org/apache/hudi/index/bucket/ConsistentBucketIndexUtils.java
Outdated
Show resolved
Hide resolved
Hi, could u please describe what problems will be caused if this fix is not made? Will there be any issues regarding correctness? |
Commit-marker-file will always be created in wrong file path.You never know that a certain version of hash metadata has been committed because the scan path is different. Therefore, it is necessary to go to the timeline each time and then resubmit this file, which will put a lot of pressure on the timeline. In the current code logic, if there is only one hash metadata that is not committed and it is still in pending state, the current code logic will return empty instead of the recently committed hash metadata. So I fix these two problems. |
fb1ea61
to
13c80d9
Compare
...udi-client-common/src/main/java/org/apache/hudi/index/bucket/ConsistentBucketIndexUtils.java
Outdated
Show resolved
Hide resolved
metaFiles.stream().filter(hashingMetaCommitFilePredicate) | ||
.forEach(commitFile -> { | ||
String instantTime = HoodieConsistentHashingMetadata.getTimestampFromFile(commitFile.getPath().getName()); | ||
if (!versionedHashMetadataFiles.containsKey(instantTime)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just fix the metadata path and keep the other logic as it is so that it is more easier to review, if you wanna a refactoring to the code, let's fire another PR to address it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just fix the metadata path and keep the other logic as it is so that it is more easier to review, if you wanna a refactoring to the code, let's fire another PR to address it.
I originally intended to do this like u said, but found that fixing the metadata path code exposed the problem of the original code, assuming that there is now an inflight clustering, when loading metadata, according to the original code, it will return empty, which is not expected. Therefore, I fixed these two problems at the same time, and it was difficult to fix them separately, because the original unit-test would be failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to fix the issues one by one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to fix the issues one by one
Hi! I have updated this pr. This pr will only fix the problem with the wrong file path, and I will initiate a new pr after this pr is merged with the changes in refactoring metadata management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to fix the issues one by one
As you said, after we merging the wrong path problem fixing pr, I raise a new pr to refactor consistent-hash-metadata management for better code readability and maintainability, lets keep reviewing at: #12512
13c80d9
to
5ad0dc2
Compare
@danny0405 Hi! I have added some tests to verify the rationality of these changes, please review it again. |
...nt/hudi-client-common/src/main/java/org/apache/hudi/client/BaseHoodieTableServiceClient.java
Outdated
Show resolved
Hide resolved
1. wrong file path for consistent-bucket-commit-marker-file 2. return the max committed metadata file when no hash metadata file has been successfully fixed Signed-off-by: TheR1sing3un <[email protected]>
5ad0dc2
to
97df8a4
Compare
}).findFirst()); | ||
ValidationUtils.checkState(maxCommittedMetadataFileOpt.isPresent(), | ||
() -> "Failed to find max committed metadata file but commit marker file exist with instant: " + maxCommittedMetadataFileOpt); | ||
fixed.add(maxCommittedMetadataFileOpt.get()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return the maxCommittedMetadataFileOpt
directly in line 175 instead of adding it to the fixed list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we return the
maxCommittedMetadataFileOpt
directly in line 175 instead of adding it to the fixed list.
Nice suggestion! I have updated it.
…f adding it to the fixed list. 1. return the `maxCommittedMetadataFileOpt` directly instead of adding it to the fixed list. Signed-off-by: TheR1sing3un <[email protected]>
issue: #12338
Change Logs
Describe context and summary for this change. Highlight if any code was copied.
Impact
Describe any public API or user-facing feature change or any performance impact.
none
Risk level (write none, low medium or high below)
low
If medium or high, explain what verification was done to mitigate the risks.
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist