Add fluent API for TableSink and Java/Spark optimizer mappings#752
Merged
zkaoudi merged 1 commit intoapache:mainfrom May 4, 2026
Merged
Add fluent API for TableSink and Java/Spark optimizer mappings#752zkaoudi merged 1 commit intoapache:mainfrom
zkaoudi merged 1 commit intoapache:mainfrom
Conversation
- Add writeTable() method in DataQuanta.scala for the underlying fluent API - Add writeTable() overloads in DataQuantaBuilder.scala for the Java-facing fluent API (with and without optional jobName) - Add TableSinkMapping in wayang-java to register JavaTableSink with the optimizer; previously the JavaTableSink operator existed but had no mapping, making it unreachable through the optimizer - Add TableSinkMapping in wayang-spark to register SparkTableSink with the optimizer for the same reason - Register both mappings in their respective Mappings.java files This enables fluent pipelines like planBuilder.readTable(source).writeTable(...) to be routed by the optimizer to the appropriate platform-specific sink.
zkaoudi
approved these changes
May 4, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hey again Wayang community!
This PR adds a fluent API method (
writeTable) for theTableSinkoperator in bothDataQuanta.scalaandDataQuantaBuilder.scala, so users can express table-write pipelines with the same syntax that's already available for other sinks (writeTextFile,writeParquet,writeKafkaTopic, etc.). It also fills a gap: theJavaTableSinkandSparkTableSinkoperators currently have no optimizer mappings, which means the planner never selects them and the operators are effectively unreachable through normal pipelines. Two small mapping classes are added to fix that.What this PR adds
Fluent API methods (
wayang-api-scala-java)DataQuanta.writeTable(tableName, mode, columnNames, props)— creates a logicalTableSink, wires it into the plan, and triggers execution. Follows the same pattern as the existingwriteKafkaTopicandwriteParquetmethods.DataQuantaBuilder.writeTable(...)with two overloads (with and without optionaljobName) — the Java-facing fluent layer that delegates to the ScalaDataQuantamethod. Same overload pattern used by the existingwriteParquetmethods.Optimizer mappings
TableSinkMappinginwayang-java(registered inMappings.java) — transforms the logicalTableSinkintoJavaTableSinkfor the Java platform.TableSinkMappinginwayang-spark(registered inMappings.java) — transforms the logicalTableSinkintoSparkTableSinkfor the Spark platform.Without these mappings the optimizer cannot route a logical
TableSinkto either execution operator. Pipelines that rely on the optimizer (including any fluent pipeline) fail with "no execution plan found" because no platform claims the sink. Both mapping classes follow the exact pattern of the existingParquetSinkMappingin each platform.Testing
I verified end-to-end with two pipelines against a real PostgreSQL instance:
planBuilder.readTable(source).writeTable(...)— copied a six-row source table into a new target table.planBuilder.readTable(source).filter(...).writeTable(...)— applied a Java filter and wrote the single matching row to a new target table.Both pipelines produce execution plans where the optimizer correctly selects
JavaTableSinkafter the new mapping is registered, confirming that the fluent method, the underlyingTableSinkplan wiring, and the new mappings all work together. Without the mappings, the same plans fail at optimization with "Could not find a single execution plan."Notes
main. Once the in-databaseJdbcTableSinkOperatorPR is merged, the same fluentwriteTable(...)call will automatically benefit from the in-database execution path when the source and sink share a database platform — no changes to the fluent API will be required, since the optimizer simply gains additionalTableSinkmappings to choose from.ParquetSinkMappingpattern. They were absent from the originalJavaTableSink/SparkTableSinkcontributions.Files
wayang-api/wayang-api-scala-java/src/main/scala/org/apache/wayang/api/DataQuanta.scala— addedwriteTablewayang-api/wayang-api-scala-java/src/main/scala/org/apache/wayang/api/DataQuantaBuilder.scala— addedwriteTable(two overloads)wayang-platforms/wayang-java/src/main/java/org/apache/wayang/java/mapping/TableSinkMapping.java— newwayang-platforms/wayang-java/src/main/java/org/apache/wayang/java/mapping/Mappings.java— registrationwayang-platforms/wayang-spark/src/main/java/org/apache/wayang/spark/mapping/TableSinkMapping.java— newwayang-platforms/wayang-spark/src/main/java/org/apache/wayang/spark/mapping/Mappings.java— registration