[HUDI-8776] Add schema evolution configs to SparkBroadcastManager #12510

linliu-code · 2024-12-17T23:00:05Z

Change Logs

Add specific configs for schema evolution.

Impact

Support schema evolution better.

Risk level (write none, low medium or high below)

Low.

Documentation Update

Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".

The config description must be updated if new configs are added or the default value of the configs are changed
Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the
ticket number here and follow the instruction to make
changes to the website.

Contributor's checklist

Read through contributor's guide
Change Logs and Impact were stated clearly
Adequate tests were added if applicable
CI passed

hudi-bot · 2024-12-18T23:18:09Z

CI report:

1be400a Azure: SUCCESS

Bot commands

@hudi-bot supports the following commands:

@hudi-bot run azure re-run the last Azure build

danny0405 · 2024-12-19T02:35:07Z

hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/SparkBroadcastManager.java

+    Map<String, String> configs = new HashMap<>();
+    if (internalSchemaOpt.isPresent()) {
+      List<String> instantFiles = timeline.getInstants().stream().map(fileNameGenerator::getFileName).collect(Collectors.toList());
+      configs.put(SparkInternalSchemaConverter.HOODIE_VALID_COMMITS_LIST, String.join(",", instantFiles));


Not sure why instant file name is needed, should we just use the instant timestamp string instead?

@jonvex should be able to answer since Jon added this for fg reader.

hudi/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala

Line 678 in 9da3221

protected def embedInternalSchema(conf: Configuration, internalSchemaOpt: Option[InternalSchema]): Configuration = {

I'm sure I just copied this method

github-actions bot added the size:M PR with lines of changes in (100, 300] label Dec 17, 2024

linliu-code force-pushed the ENG-18710-1 branch from afbf531 to 944a155 Compare December 18, 2024 00:38

github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Dec 18, 2024

linliu-code added 2 commits December 18, 2024 11:26

Add schema evolution configs

f52625d

Rebase and refactor a bit

e14e51a

linliu-code force-pushed the ENG-18710-1 branch from 944a155 to e14e51a Compare December 18, 2024 20:01

github-actions bot added size:M PR with lines of changes in (100, 300] and removed size:L PR with lines of changes in (300, 1000] labels Dec 18, 2024

linliu-code added 2 commits December 18, 2024 12:36

Fix imports

63c5c2d

Fix formatting

1be400a

danny0405 reviewed Dec 19, 2024

View reviewed changes

jonvex approved these changes Dec 19, 2024

View reviewed changes

jonvex merged commit 143dc52 into apache:master Dec 19, 2024
43 checks passed

yihua mentioned this pull request Jan 7, 2025

[HUDI-8634] Support schema on read in file group reader-based compaction and clustering in Spark #12586

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HUDI-8776] Add schema evolution configs to SparkBroadcastManager #12510

[HUDI-8776] Add schema evolution configs to SparkBroadcastManager #12510

linliu-code commented Dec 17, 2024

hudi-bot commented Dec 18, 2024

danny0405 Dec 19, 2024 •

edited

Loading

linliu-code Dec 19, 2024

jonvex Dec 19, 2024

[HUDI-8776] Add schema evolution configs to SparkBroadcastManager #12510

[HUDI-8776] Add schema evolution configs to SparkBroadcastManager #12510

Conversation

linliu-code commented Dec 17, 2024

Change Logs

Impact

Risk level (write none, low medium or high below)

Documentation Update

Contributor's checklist

hudi-bot commented Dec 18, 2024

CI report:

danny0405 Dec 19, 2024 • edited Loading

Choose a reason for hiding this comment

linliu-code Dec 19, 2024

Choose a reason for hiding this comment

jonvex Dec 19, 2024

Choose a reason for hiding this comment

danny0405 Dec 19, 2024 •

edited

Loading