[Server-Side Planning] Add metadata abstraction and factory pattern #5671

murali-db · 2025-12-10T12:24:24Z

🥞 Stacked PR

Use this link to review all changes.

Stack:

Integrate DeltaCatalog with ServerSidePlannedTable and add tests [Files changed] ✅ Merged
- Add metadata abstraction and factory pattern [Files changed] ⬅️ This PR
  - Add filter pushdown infrastructure [Files changed]

Summary

Introduces metadata abstraction pattern for ServerSidePlannedTable:

ServerSidePlanningMetadata trait for catalog-specific configuration
Factory pattern changed from buildForCatalog() to buildClient(metadata)
Default implementation for non-UC catalogs

Key Changes

New file:

ServerSidePlanningMetadata.scala - Trait + DefaultMetadata + factory method

Modified files:

ServerSidePlanningClient.scala - Update factory interface to accept metadata
ServerSidePlannedTable.scala - Use metadata factory, pass metadata to client builder
ServerSidePlannedTableSuite.scala - Test fromTable() method

Architecture

Table Properties (from loadTable)
        ↓
ServerSidePlanningMetadata.fromTable()
        ↓
   DefaultMetadata (for now)
   [Future: catalog-specific implementations]
        ↓
ServerSidePlanningClientFactory.buildClient(metadata)
        ↓
ServerSidePlanningClient implementation

Design Principles

1. Metadata as Interface

ServerSidePlanningMetadata trait captures what's needed to create a planning client:

planningEndpointUri - REST endpoint (empty for default)
authToken - Authentication token (None for default)
catalogName - For configuration lookups
tableProperties - Additional table properties

2. Factory Method for Metadata

ServerSidePlanningMetadata.fromTable() extracts metadata from loaded table:

Currently returns DefaultMetadata for all catalogs
Future PRs will add catalog-specific implementations

3. Package-Private Implementation

Trait and classes are private[serverSidePlanning] - implementation details hidden from outside.

Testing

6 tests covering:

Full query execution through DeltaCatalog
Decision logic for server-side planning
fromTable() returns metadata with defaults (updated)
ServerSidePlannedTable is read-only
Normal path unchanged when feature disabled

Behavior

No functional changes:

Same tables use server-side planning
Same query execution flow
Zero impact on existing Delta behavior

Commit: 6d76fce

murali-db · 2025-12-10T12:28:45Z

Updated: Rebased on fixed Row 2. Fixed typo in databaseName parameter.

murali-db · 2025-12-10T12:36:24Z

Fixed regressions from Row 1 review:

Restored private[serverSidePlanning] modifiers on trait and object
Restored private[serverSidePlanning] on setFactory() and clearFactory()
Fixed comment: @param database → @param databaseName

murali-db · 2025-12-10T12:41:50Z

Fixed additional regressions from Row 1:

Updated copyright year 2021 → 2025 in 3 new files (ServerSidePlanningMetadata.scala, UnityCatalogMetadata.scala, TestMetadata.scala)
Fixed parameter name: schema → tableSchema in ServerSidePlannedFilePartitionReaderFactory

Note: buildClient(metadata) is NOT a regression - it's the core feature of Row 3 (metadata abstraction pattern). Row 1 has buildForCatalog(catalogName), Row 3 changes it to buildClient(metadata).

…able and add tests (#5622) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/5622/files) to review all changes. **Stack:** - **[Integrate DeltaCatalog with ServerSidePlannedTable and add tests](#5622 [[Files changed](https://github.com/delta-io/delta/pull/5622/files)] ⬅️ _This PR_ - [Add metadata abstraction and factory pattern](#5671) [[Files changed](https://github.com/delta-io/delta/pull/5671/files/c94cf91814aa5bf3080f5191f75535f0cc5d2d51..f06cc6576c63ab1075bace94fb88d1a4a64ef308)] --- ## Summary Integrates server-side planning into DeltaCatalog.loadTable(): - Detect Unity Catalog tables without credentials - Return ServerSidePlannedTable instead of normal DeltaTableV2 - Add ENABLE_SERVER_SIDE_PLANNING config flag for testing - Move ServerSidePlannedTable to serverSidePlanning package ## Decision Logic ServerSidePlanning is used when: 1. `ENABLE_SERVER_SIDE_PLANNING` config is true (force flag for testing) 2. OR: Unity Catalog table without credentials (actual use case) ## Key Changes - `DeltaCatalog.loadTable()`: Add ServerSidePlannedTable.tryCreate() call - `ServerSidePlannedTable.shouldUseServerSidePlanning()`: Decision logic - Package refactor: Move from catalog/ to serverSidePlanning/ - Add `DeltaSQLConf.ENABLE_SERVER_SIDE_PLANNING` config ## Testing 4 tests covering: - Full query execution through DeltaCatalog - Normal path unchanged when disabled - Decision logic for all scenarios - Read-only enforcement (INSERT fails) ``` spark/testOnly org.apache.spark.sql.delta.serverSidePlanning.ServerSidePlannedTableSuite ``` Note: Tests verified in fork. Upstream master has kernel-api compilation issues (pre-existing, not related to these changes). Co-authored-by: Claude <[email protected]>

murali-db · 2025-12-10T17:17:47Z

spark/src/test/scala/org/apache/spark/sql/delta/serverSidePlanning/TestMetadata.scala

+ * Test implementation of ServerSidePlanningMetadata with injectable values.
+ * Used in unit tests to mock UC metadata without a real UC instance.
+ */
+case class TestMetadata(


Removed UCMetadata from this PR as that contained references to iceberg-rest endpoint. We'll introduce that with the later catalog-specific PRs.

why is this not a simple case class? and why does it need tests of its own when its such a simple class?

and wht is the testmetadata even used for in this PR? if its not used, but is going to be used later, please introduce it later.

tdas · 2025-12-10T19:55:23Z

.../src/main/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlanningClient.scala

   * @param spark The SparkSession
-   * @param catalogName The name of the catalog (e.g., "spark_catalog", "unity")
-   * @return A ServerSidePlanningClient configured for the specified catalog
+   * @param metadata Metadata extracted from loadTable response


metadata necessary for serverside planning
(does not matter who extracted it)

tdas · 2025-12-10T20:17:42Z

...rc/main/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlanningMetadata.scala

+  /**
+   * Authentication token for the planning endpoint.
+   */
+  def authToken: Option[String]


in future we may want to generalize this to allow different auth mechanisms other than token.

tdas · 2025-12-10T20:20:00Z

...c/test/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlannedTableSuite.scala

    }
  }
+
+  test("TestMetadata returns injected values") {


this is test of a test class... i dont think it makes sense. especially when the class is soo simple.

tdas · 2025-12-10T20:24:37Z

...rc/main/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlanningMetadata.scala

+ * This interface captures all information from the catalog's loadTable response
+ * that is needed to create and configure a ServerSidePlanningClient.
+ */
+trait ServerSidePlanningMetadata {


does this need to be public?

tdas · 2025-12-10T20:25:45Z

...c/test/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlannedTableSuite.scala

+    assert(metadata.tableProperties == Map("key1" -> "value1", "key2" -> "value2"))
+  }
+
+  test("DefaultMetadata provides empty defaults for non-UC catalogs") {


the DefaultMetadata has no logic inside it that is tied to UC. ServerSidePlanningMetadata.fromTable() is what you need to test .. isnt it? DefaultMetadata is just implementation details

…able and add tests (delta-io#5622) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/5622/files) to review all changes. **Stack:** - **[Integrate DeltaCatalog with ServerSidePlannedTable and add tests](delta-io#5622 [[Files changed](https://github.com/delta-io/delta/pull/5622/files)] ⬅️ _This PR_ - [Add metadata abstraction and factory pattern](delta-io#5671) [[Files changed](https://github.com/delta-io/delta/pull/5671/files/c94cf91814aa5bf3080f5191f75535f0cc5d2d51..f06cc6576c63ab1075bace94fb88d1a4a64ef308)] --- ## Summary Integrates server-side planning into DeltaCatalog.loadTable(): - Detect Unity Catalog tables without credentials - Return ServerSidePlannedTable instead of normal DeltaTableV2 - Add ENABLE_SERVER_SIDE_PLANNING config flag for testing - Move ServerSidePlannedTable to serverSidePlanning package ## Decision Logic ServerSidePlanning is used when: 1. `ENABLE_SERVER_SIDE_PLANNING` config is true (force flag for testing) 2. OR: Unity Catalog table without credentials (actual use case) ## Key Changes - `DeltaCatalog.loadTable()`: Add ServerSidePlannedTable.tryCreate() call - `ServerSidePlannedTable.shouldUseServerSidePlanning()`: Decision logic - Package refactor: Move from catalog/ to serverSidePlanning/ - Add `DeltaSQLConf.ENABLE_SERVER_SIDE_PLANNING` config ## Testing 4 tests covering: - Full query execution through DeltaCatalog - Normal path unchanged when disabled - Decision logic for all scenarios - Read-only enforcement (INSERT fails) ``` spark/testOnly org.apache.spark.sql.delta.serverSidePlanning.ServerSidePlannedTableSuite ``` Note: Tests verified in fork. Upstream master has kernel-api compilation issues (pre-existing, not related to these changes). Co-authored-by: Claude <[email protected]>

Introduces metadata abstraction pattern for ServerSidePlannedTable to support different catalog types without hardcoding catalog-specific logic. **Metadata Abstraction:** - ServerSidePlanningMetadata trait encapsulates catalog-specific information - DefaultMetadata implementation for all catalogs (returns empty optional values) - TestMetadata implementation for unit testing with injectable values - Factory pattern: ServerSidePlanningMetadata.fromTable() creates appropriate metadata **Factory Pattern Changes:** - ServerSidePlanningClient factory changed from buildForCatalog(catalogName) to buildClient(metadata) - Enables catalog-specific configuration without changing client interface **Unity Catalog Integration:** Unity Catalog-specific metadata implementation will be added in Row 8 with Iceberg REST catalog integration. This PR focuses on the abstraction pattern only. **Tests (7 total):** - Full query execution through DeltaCatalog - Decision logic for server-side planning - DefaultMetadata returns None for all optional fields - TestMetadata returns injected values - Read-only enforcement - Normal path unchanged when disabled - All metadata factory method logic **Note:** Spark module only - excludes iceberg module changes for Row 7/8. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

murali-db · 2025-12-10T21:52:07Z

Thanks for the review @tdas! Addressed your feedback:

Removed TestMetadata - It wasn't actually used (TestServerSidePlanningClientFactory ignores the metadata parameter)
Removed test "TestMetadata returns injected values" - Unnecessary test of unused class
Changed test to test fromTable() - Now tests the public API instead of implementation details
Made trait/classes private - ServerSidePlanningMetadata and DefaultMetadata are now private[serverSidePlanning]
Updated comment wording - Changed "Metadata extracted from loadTable response" → "Metadata necessary for server-side planning"

Note on authToken comment: Captured for future PR - we may want to generalize auth mechanisms.

Commit: 6d76fce

tdas · 2025-12-11T19:58:34Z

...c/test/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlannedTableSuite.scala

    }
  }
+
+  test("fromTable returns metadata with empty defaults for non-UC catalogs") {


what fromTable? you have to say ServerSidePlanningMetadata.fromTable?
please address in the follow up PR

## Stacked PR Use this [link](https://github.com/delta-io/delta/pull/5672/files) to review all changes. **Stack:** - [Integrate DeltaCatalog with ServerSidePlannedTable and add tests](#5622) [[Files changed](https://github.com/delta-io/delta/pull/5622/files)] - [Add metadata abstraction and factory pattern](#5671) [[Files changed](https://github.com/delta-io/delta/pull/5671/files)] - **[Add filter pushdown infrastructure](#5672 [[Files changed](https://github.com/delta-io/delta/pull/5672/files)] <-- _This PR_ - [Add projection pushdown infrastructure](#5685) [[Files changed](https://github.com/delta-io/delta/pull/5685/files/f5bc498a47be8fc7457269939d510a2ccaba52ba..5b3a008a94a7d35f91868e93205b75fc65569ebb)] --- ## Summary Adds generic filter pushdown infrastructure to ServerSidePlanningClient: - Add filter parameter to **ServerSidePlanningClient.planScan()** interface - Use Spark's **Filter** type as catalog-agnostic representation - Implement **SupportsPushDownFilters** in ServerSidePlannedScanBuilder - Capture filter in **TestServerSidePlanningClient** companion object for test verification - Tests verifying filters are passed through to planning client correctly ## Key Changes **Modified files (4):** - `ServerSidePlanningClient.scala` - Added filter parameter - `ServerSidePlannedTable.scala` - Implements SupportsPushDownFilters, passes filters through to planning client - `TestServerSidePlanningClient.scala` - Added companion object to capture filter for test verification - `ServerSidePlannedTableSuite.scala` - Added 3 filter passthrough tests ## Design ### Conservative Filter Handling We return all filters as "residuals" (meaning Spark will re-apply them after the catalog returns results). This is conservative but correct: - We don't know yet what the catalog can handle - Better to redundantly filter (slow but correct) than claim we handle filters we don't (fast but wrong) - Future PRs will add catalog capabilities to avoid redundant filtering ### Filter Combining When multiple filters are pushed (e.g., `WHERE id > 1 AND value < 30`), we combine them into a single And filter before sending to the planning client. This keeps the interface simple (single Option[Filter] instead of Array[Filter]). ## Testing 8 tests total (spark module only): - Full query execution through DeltaCatalog - Decision logic for server-side planning - ServerSidePlanningMetadata.fromTable() returns metadata with defaults - ServerSidePlannedTable is read-only - Normal path unchanged when feature disabled - **Simple EqualTo filter (WHERE id = 2)** - **Compound And filter (WHERE id > 1 AND value < 30)** - **No filter when no WHERE clause** ## Behavior **Filter passthrough working:** - Filters from SQL WHERE clauses are captured and passed to planning client - Test infrastructure validates filter objects are passed correctly - Planning client receives filters but doesn't apply logic yet (next PR) - Spark re-applies filters as residuals (conservative approach ensures correctness) **No behavior changes to existing functionality:** - Zero impact on existing Delta behavior - Feature only activates when config flag enabled - This PR enables future catalog implementations to receive and process filter pushdown --- **Commit:** f5bc498 --------- Co-authored-by: Claude <[email protected]>

## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/5685/files) to review all changes. **Stack:** - [#5621 - ServerSidePlanningClient interface and DSv2 table](#5621) [[Files changed](https://github.com/delta-io/delta/pull/5621/files)] - ✅ Merged - [#5622 - Integrate DeltaCatalog with ServerSidePlanning](#5622) [[Files changed](https://github.com/delta-io/delta/pull/5622/files)] - ✅ Merged - [#5671 - Add metadata abstraction and factory pattern](#5671) [[Files changed](https://github.com/delta-io/delta/pull/5671/files)] - ✅ Merged - [#5672 - Add filter pushdown infrastructure](#5672) [[Files changed](https://github.com/delta-io/delta/pull/5672/files)] - ✅ Merged - **[#5685 - Add projection pushdown infrastructure](#5685 [[Files changed](https://github.com/delta-io/delta/pull/5685/files)] ⬅️ _This PR_ --- # [Server-Side Planning] Add projection pushdown infrastructure This PR adds infrastructure for pushing down column projections (SELECT column list) to the server-side planning client. ## What This PR Does 1. **Add projection parameter to `planScan()` interface** - New optional `projection: Option[StructType]` parameter - Allows catalogs to receive required column information 2. **Implement `SupportsPushDownRequiredColumns` in `ServerSidePlannedScanBuilder`** - Spark calls `pruneColumns(requiredSchema)` when columns are pruned - Stores required schema to pass to server 3. **Pass projection through to `planScan()` call** - Only send projection if columns are actually pruned (not SELECT *) - Catalog can use this to optimize file reading (skip columns, etc.) 4. **Extend test infrastructure** - `TestServerSidePlanningClient` now captures both filter AND projection - Companion object provides `getCapturedFilter()` and `getCapturedProjection()` - Added 3 new tests for projection pushdown ## Files Changed ### Modified Files (4 files) - `spark/src/main/scala/.../ServerSidePlanningClient.scala` - Add projection parameter - `spark/src/main/scala/.../ServerSidePlannedTable.scala` - Implement SupportsPushDownRequiredColumns - `spark/src/test/scala/.../TestServerSidePlanningClient.scala` - Add projection capturing - `spark/src/test/scala/.../ServerSidePlannedTableSuite.scala` - Add projection tests ## Tests Total: **11 tests** (8 existing + 3 new projection tests) **New projection tests:** - `"projection pushed when selecting specific columns"` - Verifies projection for `SELECT id, name` - `"no projection pushed when selecting all columns"` - Verifies `None` for `SELECT *` - `"projection and filter pushed together"` - Verifies both work simultaneously **Existing tests** (from previous PRs): - Simple EqualTo filter - Compound And filter - No filter when no WHERE clause - Full query through DeltaCatalog - Normal path unchanged when disabled - Decision logic tests - Read-only verification - Metadata factory test ## Design Notes **Conservative approach:** Like filter pushdown, we pass all required columns to the catalog but don't claim we've eliminated any. Spark will still validate the schema post-read. This ensures correctness while allowing catalogs to optimize (skip columns, read less data, etc.) **Interface design:** Uses `StructType` (Spark's standard schema representation) to remain catalog-agnostic. Each catalog can interpret the projection in their own way. --------- Co-authored-by: Claude <[email protected]>

murali-db mentioned this pull request Dec 10, 2025

[Feature Request][Spark] Support ServerSide Table Scan Planning for Fine-Grained Access Control #5623

Open

8 tasks

murali-db force-pushed the upstream/row3-metadata-abstraction-clean branch from be90679 to 6ce4f27 Compare December 10, 2025 12:28

murali-db force-pushed the upstream/row3-metadata-abstraction-clean branch from 6ce4f27 to 088a11a Compare December 10, 2025 12:36

murali-db force-pushed the upstream/row3-metadata-abstraction-clean branch from 088a11a to 8169989 Compare December 10, 2025 12:41

murali-db mentioned this pull request Dec 10, 2025

[Server-Side Planning] Integrate DeltaCatalog with ServerSidePlannedTable and add tests #5622

Merged

murali-db force-pushed the upstream/row3-metadata-abstraction-clean branch 5 times, most recently from 15009c6 to 6fd96c6 Compare December 10, 2025 13:48

murali-db mentioned this pull request Dec 10, 2025

[Server-Side Planning] Add filter pushdown infrastructure #5672

Merged

murali-db force-pushed the upstream/row3-metadata-abstraction-clean branch from 6fd96c6 to 802aa73 Compare December 10, 2025 17:14

murali-db commented Dec 10, 2025

View reviewed changes

tdas reviewed Dec 10, 2025

View reviewed changes

murali-db force-pushed the upstream/row3-metadata-abstraction-clean branch from 802aa73 to 6d76fce Compare December 10, 2025 21:48

tdas reviewed Dec 11, 2025

View reviewed changes

tdas approved these changes Dec 11, 2025

View reviewed changes

tdas merged commit 3728839 into delta-io:master Dec 11, 2025
14 checks passed

murali-db mentioned this pull request Dec 12, 2025

[Server-Side Planning] Add projection pushdown infrastructure #5685

Merged

[Server-Side Planning] Add metadata abstraction and factory pattern #5671

[Server-Side Planning] Add metadata abstraction and factory pattern #5671

Conversation

murali-db commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🥞 Stacked PR

Summary

Key Changes

Architecture

Design Principles

1. Metadata as Interface

2. Factory Method for Metadata

3. Package-Private Implementation

Testing

Behavior

Uh oh!

murali-db commented Dec 10, 2025

Uh oh!

murali-db commented Dec 10, 2025

Uh oh!

murali-db commented Dec 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

murali-db commented Dec 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

murali-db commented Dec 10, 2025 •

edited

Loading