[Server-Side Planning] Add projection pushdown infrastructure #5685

murali-db · 2025-12-12T11:43:27Z

🥞 Stacked PR

Use this link to review all changes.

Stack:

#5621 - ServerSidePlanningClient interface and DSv2 table [Files changed] - ✅ Merged
- #5622 - Integrate DeltaCatalog with ServerSidePlanning [Files changed] - ✅ Merged
  - #5671 - Add metadata abstraction and factory pattern [Files changed] - ✅ Merged
    - #5672 - Add filter pushdown infrastructure [Files changed] - ✅ Merged
      - #5685 - Add projection pushdown infrastructure [Files changed] ⬅️ This PR

[Server-Side Planning] Add projection pushdown infrastructure

This PR adds infrastructure for pushing down column projections (SELECT column list) to the server-side planning client.

What This PR Does

Add projection parameter to planScan() interface
- New optional projection: Option[StructType] parameter
- Allows catalogs to receive required column information
Implement SupportsPushDownRequiredColumns in ServerSidePlannedScanBuilder
- Spark calls pruneColumns(requiredSchema) when columns are pruned
- Stores required schema to pass to server
Pass projection through to planScan() call
- Only send projection if columns are actually pruned (not SELECT *)
- Catalog can use this to optimize file reading (skip columns, etc.)
Extend test infrastructure
- TestServerSidePlanningClient now captures both filter AND projection
- Companion object provides getCapturedFilter() and getCapturedProjection()
- Added 3 new tests for projection pushdown

Files Changed

Modified Files (4 files)

spark/src/main/scala/.../ServerSidePlanningClient.scala - Add projection parameter
spark/src/main/scala/.../ServerSidePlannedTable.scala - Implement SupportsPushDownRequiredColumns
spark/src/test/scala/.../TestServerSidePlanningClient.scala - Add projection capturing
spark/src/test/scala/.../ServerSidePlannedTableSuite.scala - Add projection tests

Tests

Total: 11 tests (8 existing + 3 new projection tests)

New projection tests:

"projection pushed when selecting specific columns" - Verifies projection for SELECT id, name
"no projection pushed when selecting all columns" - Verifies None for SELECT *
"projection and filter pushed together" - Verifies both work simultaneously

Existing tests (from previous PRs):

Simple EqualTo filter
Compound And filter
No filter when no WHERE clause
Full query through DeltaCatalog
Normal path unchanged when disabled
Decision logic tests
Read-only verification
Metadata factory test

Design Notes

Conservative approach: Like filter pushdown, we pass all required columns to the catalog but don't claim we've eliminated any. Spark will still validate the schema post-read. This ensures correctness while allowing catalogs to optimize (skip columns, read less data, etc.)

Interface design: Uses StructType (Spark's standard schema representation) to remain catalog-agnostic. Each catalog can interpret the projection in their own way.

## Stacked PR Use this [link](https://github.com/delta-io/delta/pull/5672/files) to review all changes. **Stack:** - [Integrate DeltaCatalog with ServerSidePlannedTable and add tests](#5622) [[Files changed](https://github.com/delta-io/delta/pull/5622/files)] - [Add metadata abstraction and factory pattern](#5671) [[Files changed](https://github.com/delta-io/delta/pull/5671/files)] - **[Add filter pushdown infrastructure](#5672 [[Files changed](https://github.com/delta-io/delta/pull/5672/files)] <-- _This PR_ - [Add projection pushdown infrastructure](#5685) [[Files changed](https://github.com/delta-io/delta/pull/5685/files/f5bc498a47be8fc7457269939d510a2ccaba52ba..5b3a008a94a7d35f91868e93205b75fc65569ebb)] --- ## Summary Adds generic filter pushdown infrastructure to ServerSidePlanningClient: - Add filter parameter to **ServerSidePlanningClient.planScan()** interface - Use Spark's **Filter** type as catalog-agnostic representation - Implement **SupportsPushDownFilters** in ServerSidePlannedScanBuilder - Capture filter in **TestServerSidePlanningClient** companion object for test verification - Tests verifying filters are passed through to planning client correctly ## Key Changes **Modified files (4):** - `ServerSidePlanningClient.scala` - Added filter parameter - `ServerSidePlannedTable.scala` - Implements SupportsPushDownFilters, passes filters through to planning client - `TestServerSidePlanningClient.scala` - Added companion object to capture filter for test verification - `ServerSidePlannedTableSuite.scala` - Added 3 filter passthrough tests ## Design ### Conservative Filter Handling We return all filters as "residuals" (meaning Spark will re-apply them after the catalog returns results). This is conservative but correct: - We don't know yet what the catalog can handle - Better to redundantly filter (slow but correct) than claim we handle filters we don't (fast but wrong) - Future PRs will add catalog capabilities to avoid redundant filtering ### Filter Combining When multiple filters are pushed (e.g., `WHERE id > 1 AND value < 30`), we combine them into a single And filter before sending to the planning client. This keeps the interface simple (single Option[Filter] instead of Array[Filter]). ## Testing 8 tests total (spark module only): - Full query execution through DeltaCatalog - Decision logic for server-side planning - ServerSidePlanningMetadata.fromTable() returns metadata with defaults - ServerSidePlannedTable is read-only - Normal path unchanged when feature disabled - **Simple EqualTo filter (WHERE id = 2)** - **Compound And filter (WHERE id > 1 AND value < 30)** - **No filter when no WHERE clause** ## Behavior **Filter passthrough working:** - Filters from SQL WHERE clauses are captured and passed to planning client - Test infrastructure validates filter objects are passed correctly - Planning client receives filters but doesn't apply logic yet (next PR) - Spark re-applies filters as residuals (conservative approach ensures correctness) **No behavior changes to existing functionality:** - Zero impact on existing Delta behavior - Feature only activates when config flag enabled - This PR enables future catalog implementations to receive and process filter pushdown --- **Commit:** f5bc498 --------- Co-authored-by: Claude <[email protected]>

Updated ServerSidePlanningClient documentation to clarify the Filter Conversion Pattern: Spark Filter is the universal representation, and each catalog implementation converts to their own native format. Changes: - Added Filter Conversion Pattern section explaining catalog responsibilities - Enhanced filter parameter documentation with conversion examples - Clarified that Iceberg, Unity Catalog, and other catalogs each provide their own converters as private implementation details This is a documentation-only change with zero behavior changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>

- Add projection parameter to ServerSidePlanningClient.planScan() - Implement SupportsPushDownRequiredColumns in ServerSidePlannedScanBuilder - Add projection capturing to PushdownCapturingTestClient - Add 3 projection passthrough tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

tdas

LGTM

This was referenced Dec 12, 2025

[Feature Request][Spark] Support ServerSide Table Scan Planning for Fine-Grained Access Control #5623

Open

[Server-Side Planning] Add filter pushdown infrastructure #5672

Merged

murali-db force-pushed the upstream/row5-projection-infrastructure branch 8 times, most recently from c0a6a25 to 5b3a008 Compare December 12, 2025 14:25

murali-db and others added 2 commits December 13, 2025 00:13

murali-db force-pushed the upstream/row5-projection-infrastructure branch from 5b3a008 to c93b9c5 Compare December 13, 2025 00:15

murali-db added 4 commits December 13, 2025 02:19

Remove Filter Conversion Pattern comment from interface documentation

6cc9396

Add back comment about Spark Filter as universal representation

dc414fa

Improve interface documentation to clarify catalog-agnostic design

7d1d095

Simplify parameter documentation to avoid redundancy

5f6ee2a

tdas approved these changes Dec 13, 2025

View reviewed changes

tdas merged commit 9a2fc89 into delta-io:master Dec 13, 2025
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Server-Side Planning] Add projection pushdown infrastructure #5685

[Server-Side Planning] Add projection pushdown infrastructure #5685

murali-db commented Dec 12, 2025 •

edited

Loading

Uh oh!

tdas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[Server-Side Planning] Add projection pushdown infrastructure #5685

[Server-Side Planning] Add projection pushdown infrastructure #5685

Conversation

murali-db commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🥞 Stacked PR

[Server-Side Planning] Add projection pushdown infrastructure

What This PR Does

Files Changed

Modified Files (4 files)

Tests

Design Notes

Uh oh!

tdas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

murali-db commented Dec 12, 2025 •

edited

Loading