Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HUDI-8590] fix: wrong file path for consistent-bucket-commit-marker-file #12344

Merged

Conversation

TheR1sing3un
Copy link
Member

issue: #12338

Change Logs

  1. wrong file path for consistent-bucket-commit-marker-file

Describe context and summary for this change. Highlight if any code was copied.

Impact

Describe any public API or user-facing feature change or any performance impact.
none

Risk level (write none, low medium or high below)

low
If medium or high, explain what verification was done to mitigate the risks.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

  • The config description must be updated if new configs are added or the default value of the configs are changed
  • Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
    ticket number here and follow the instruction to make
    changes to the website.

Contributor's checklist

  • Read through contributor's guide
  • Change Logs and Impact were stated clearly
  • Adequate tests were added if applicable
  • CI passed

@TheR1sing3un
Copy link
Member Author

@beyond1920 @danny0405 HI, I find a bug about incorrect path for the Consistent-Bucket-Commit-Marker-File. Please have a look. Thanks!

@github-actions github-actions bot added size:S PR with lines of changes in (10, 100] size:M PR with lines of changes in (100, 300] and removed size:S PR with lines of changes in (10, 100] labels Nov 27, 2024
@TheR1sing3un TheR1sing3un force-pushed the fix_wrong_consistent_commit_marker_file branch 2 times, most recently from efde0e3 to fb1ea61 Compare November 27, 2024 10:51
@yuzhaojing
Copy link
Contributor

Hi, could u please describe what problems will be caused if this fix is not made? Will there be any issues regarding correctness?

@TheR1sing3un
Copy link
Member Author

Hi, could u please describe what problems will be caused if this fix is not made? Will there be any issues regarding correctness?

Commit-marker-file will always be created in wrong file path.You never know that a certain version of hash metadata has been committed because the scan path is different. Therefore, it is necessary to go to the timeline each time and then resubmit this file, which will put a lot of pressure on the timeline. In the current code logic, if there is only one hash metadata that is not committed and it is still in pending state, the current code logic will return empty instead of the recently committed hash metadata. So I fix these two problems.

@TheR1sing3un TheR1sing3un force-pushed the fix_wrong_consistent_commit_marker_file branch from fb1ea61 to 13c80d9 Compare December 9, 2024 08:47
@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Dec 9, 2024
metaFiles.stream().filter(hashingMetaCommitFilePredicate)
.forEach(commitFile -> {
String instantTime = HoodieConsistentHashingMetadata.getTimestampFromFile(commitFile.getPath().getName());
if (!versionedHashMetadataFiles.containsKey(instantTime)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just fix the metadata path and keep the other logic as it is so that it is more easier to review, if you wanna a refactoring to the code, let's fire another PR to address it.

Copy link
Member Author

@TheR1sing3un TheR1sing3un Dec 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just fix the metadata path and keep the other logic as it is so that it is more easier to review, if you wanna a refactoring to the code, let's fire another PR to address it.

I originally intended to do this like u said, but found that fixing the metadata path code exposed the problem of the original code, assuming that there is now an inflight clustering, when loading metadata, according to the original code, it will return empty, which is not expected. Therefore, I fixed these two problems at the same time, and it was difficult to fix them separately, because the original unit-test would be failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to fix the issues one by one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to fix the issues one by one

Hi! I have updated this pr. This pr will only fix the problem with the wrong file path, and I will initiate a new pr after this pr is merged with the changes in refactoring metadata management.

Copy link
Member Author

@TheR1sing3un TheR1sing3un Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to fix the issues one by one

As you said, after we merging the wrong path problem fixing pr, I raise a new pr to refactor consistent-hash-metadata management for better code readability and maintainability, lets keep reviewing at: #12512

@TheR1sing3un TheR1sing3un force-pushed the fix_wrong_consistent_commit_marker_file branch from 13c80d9 to 5ad0dc2 Compare December 16, 2024 07:46
@TheR1sing3un
Copy link
Member Author

@danny0405 Hi! I have added some tests to verify the rationality of these changes, please review it again.

1. wrong file path for consistent-bucket-commit-marker-file
2. return the max committed metadata file when no hash metadata file has been successfully fixed

Signed-off-by: TheR1sing3un <[email protected]>
@TheR1sing3un TheR1sing3un force-pushed the fix_wrong_consistent_commit_marker_file branch from 5ad0dc2 to 97df8a4 Compare December 16, 2024 10:57
@github-actions github-actions bot added size:S PR with lines of changes in (10, 100] and removed size:L PR with lines of changes in (300, 1000] labels Dec 16, 2024
}).findFirst());
ValidationUtils.checkState(maxCommittedMetadataFileOpt.isPresent(),
() -> "Failed to find max committed metadata file but commit marker file exist with instant: " + maxCommittedMetadataFileOpt);
fixed.add(maxCommittedMetadataFileOpt.get());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return the maxCommittedMetadataFileOpt directly in line 175 instead of adding it to the fixed list.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we return the maxCommittedMetadataFileOpt directly in line 175 instead of adding it to the fixed list.

Nice suggestion! I have updated it.

…f adding it to the fixed list.

1. return the `maxCommittedMetadataFileOpt` directly instead of adding it to the fixed list.

Signed-off-by: TheR1sing3un <[email protected]>
@hudi-bot
Copy link

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@danny0405 danny0405 merged commit a1411a6 into apache:master Dec 18, 2024
42 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S PR with lines of changes in (10, 100]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants