SC-211135: Implement analyzer rule for V2 streaming reads #5475

vitaliili-db · 2025-11-10T23:10:58Z

This implements Option A from the design doc: using an analyzer rule to replace V1 (DeltaTableV2) with V2 (SparkTable) for streaming reads only.

Key changes:

Add UseKernelForStreamingRule analyzer rule in spark-unified module
Rule pattern matches on StreamingRelationV2 to isolate streaming reads
Add DELTA_KERNEL_STREAMING_ENABLED config flag (default: false)
Register rule in DeltaSparkSessionExtension

Behavior:

Streaming reads (readStream) → V2 (Kernel-based, MicroBatchStream)
Streaming writes (writeStream) → V1 (DeltaLog-based)
Batch reads/writes → V1 (DeltaLog-based)
MERGE/UPDATE/DELETE → V1 (DeltaLog-based)

This approach:

Requires zero user code changes
Works with existing V1 and V2 implementations unchanged
Enables gradual rollout via configuration flag
Provides graceful fallback on errors

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

This implements Option A from the design doc: using an analyzer rule to replace V1 (DeltaTableV2) with V2 (SparkTable) for streaming reads only. Key changes: - Add UseKernelForStreamingRule analyzer rule in spark-unified module - Rule pattern matches on StreamingRelationV2 to isolate streaming reads - Add DELTA_KERNEL_STREAMING_ENABLED config flag (default: false) - Register rule in DeltaSparkSessionExtension Behavior: - Streaming reads (readStream) → V2 (Kernel-based, MicroBatchStream) - Streaming writes (writeStream) → V1 (DeltaLog-based) - Batch reads/writes → V1 (DeltaLog-based) - MERGE/UPDATE/DELETE → V1 (DeltaLog-based) This approach: - Requires zero user code changes - Works with existing V1 and V2 implementations unchanged - Enables gradual rollout via configuration flag - Provides graceful fallback on errors

…Delta tables

gengliangwang · 2025-11-17T21:43:22Z

spark-unified/src/main/scala/io/delta/sql/DeltaSparkSessionExtension.scala

+    // Register the analyzer rule for kernel-based streaming
+    // This rule replaces V1 (DeltaTableV2) with V2 (SparkTable) for streaming queries
+    extensions.injectResolutionRule { session =>
+      new UseKernelForStreamingRule(session)


The kernel-spark will be named as sparkV2. So let's rename the rule as UseV2ForStreaming

gengliangwang · 2025-11-17T21:45:00Z

spark/src/main/scala/org/apache/spark/sql/delta/sources/DeltaSQLConf.scala

+  ///////////////////
+
+  val DELTA_KERNEL_STREAMING_ENABLED =
+    buildConf("kernel.streaming.enabled")


how about sparkV2.streaming.enabled

renamed to v2.streaming.enabled

gengliangwang · 2025-11-17T21:55:57Z

spark/src/test/scala/org/apache/spark/sql/delta/UseV2ForStreamingRuleSuite.scala

+
+  test("catalog table logical plan uses V2 when enabled") {
+    withTable("test_table") {
+      sql("CREATE TABLE test_table (id INT, value STRING) USING delta")


nit: let's have a beforeAll and afterAll action for the test creatation and drop

I think withTable is cleaner?

I mean, we dont need to create table and insert table in every test case. Anyway this is nit.

Can we add this table properties? delta.feature.catalogOwned-preview=supported

gengliangwang · 2025-11-17T21:59:34Z

spark/src/test/scala/org/apache/spark/sql/delta/UseV2ForStreamingRuleSuite.scala

+    }
+  }
+
+  test("path-based table uses V1 even when config enabled") {


We need to test the V2 streaming code path with path-based table as well.
So probably we should create a new config for testing only.

I am not sure how this works, SparkTable requires Identifier as a constructor parameter. Unless we want to use a random generated identifier?

delta.`tablePath`

huan233usc · 2025-11-18T00:19:19Z

spark-unified/src/main/scala/io/delta/sql/DeltaStreamingAnalyzer.scala

+   * Check if the DataSource is a catalog-managed Delta table.
+   * We only convert catalog-managed tables to V2, not path-based tables.
+   */
+  private def isCatalogManagedDeltaTable(dataSource: DataSource): Boolean = {


#5477 will introduce utils to check is a table is a ccv2 table or uc ccv2 table. Maybe try to use/ patch it for testing?

vitaliili-db added 4 commits November 10, 2025 15:07

Fix scalastyle errors

21c9f3a

Convert StreamingRelation to StreamingRelationV2 for catalog-managed …

ff0c281

…Delta tables

Add unit tests for UseKernelForStreamingRule analyzer

66dbb9c

gengliangwang reviewed Nov 17, 2025

View reviewed changes

vitaliili-db added 3 commits November 17, 2025 14:11

Simplify analyzer and move tests to spark/src/test

d0b5805

Rename Kernel to V2 in analyzer rule, config, and tests

566cc9a

Add placeholder check for catalog-owned tables in V2 streaming rule

79be692

gengliangwang mentioned this pull request Nov 18, 2025

[Kernel-spark]Add an config and test trait to force connector to use Kernel based Dsv2 classes #5501

Merged

5 tasks

huan233usc reviewed Nov 18, 2025

View reviewed changes

SC-211135: Implement analyzer rule for V2 streaming reads #5475

Are you sure you want to change the base?

SC-211135: Implement analyzer rule for V2 streaming reads #5475

Uh oh!

Conversation

vitaliili-db commented Nov 10, 2025

Which Delta project/connector is this regarding?

Description

How was this patch tested?

Does this PR introduce any user-facing changes?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gengliangwang Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gengliangwang Nov 18, 2025 •

edited

Loading