-
Notifications
You must be signed in to change notification settings - Fork 2k
[Server-Side Planning] Add projection pushdown infrastructure #5685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tdas
merged 6 commits into
delta-io:master
from
murali-db:upstream/row5-projection-infrastructure
Dec 13, 2025
Merged
[Server-Side Planning] Add projection pushdown infrastructure #5685
tdas
merged 6 commits into
delta-io:master
from
murali-db:upstream/row5-projection-infrastructure
Dec 13, 2025
+88
−11
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This was referenced Dec 12, 2025
c0a6a25 to
5b3a008
Compare
tdas
pushed a commit
that referenced
this pull request
Dec 13, 2025
## Stacked PR Use this [link](https://github.com/delta-io/delta/pull/5672/files) to review all changes. **Stack:** - [Integrate DeltaCatalog with ServerSidePlannedTable and add tests](#5622) [[Files changed](https://github.com/delta-io/delta/pull/5622/files)] - [Add metadata abstraction and factory pattern](#5671) [[Files changed](https://github.com/delta-io/delta/pull/5671/files)] - **[Add filter pushdown infrastructure](#5672 [[Files changed](https://github.com/delta-io/delta/pull/5672/files)] <-- _This PR_ - [Add projection pushdown infrastructure](#5685) [[Files changed](https://github.com/delta-io/delta/pull/5685/files/f5bc498a47be8fc7457269939d510a2ccaba52ba..5b3a008a94a7d35f91868e93205b75fc65569ebb)] --- ## Summary Adds generic filter pushdown infrastructure to ServerSidePlanningClient: - Add filter parameter to **ServerSidePlanningClient.planScan()** interface - Use Spark's **Filter** type as catalog-agnostic representation - Implement **SupportsPushDownFilters** in ServerSidePlannedScanBuilder - Capture filter in **TestServerSidePlanningClient** companion object for test verification - Tests verifying filters are passed through to planning client correctly ## Key Changes **Modified files (4):** - `ServerSidePlanningClient.scala` - Added filter parameter - `ServerSidePlannedTable.scala` - Implements SupportsPushDownFilters, passes filters through to planning client - `TestServerSidePlanningClient.scala` - Added companion object to capture filter for test verification - `ServerSidePlannedTableSuite.scala` - Added 3 filter passthrough tests ## Design ### Conservative Filter Handling We return all filters as "residuals" (meaning Spark will re-apply them after the catalog returns results). This is conservative but correct: - We don't know yet what the catalog can handle - Better to redundantly filter (slow but correct) than claim we handle filters we don't (fast but wrong) - Future PRs will add catalog capabilities to avoid redundant filtering ### Filter Combining When multiple filters are pushed (e.g., `WHERE id > 1 AND value < 30`), we combine them into a single And filter before sending to the planning client. This keeps the interface simple (single Option[Filter] instead of Array[Filter]). ## Testing 8 tests total (spark module only): - Full query execution through DeltaCatalog - Decision logic for server-side planning - ServerSidePlanningMetadata.fromTable() returns metadata with defaults - ServerSidePlannedTable is read-only - Normal path unchanged when feature disabled - **Simple EqualTo filter (WHERE id = 2)** - **Compound And filter (WHERE id > 1 AND value < 30)** - **No filter when no WHERE clause** ## Behavior **Filter passthrough working:** - Filters from SQL WHERE clauses are captured and passed to planning client - Test infrastructure validates filter objects are passed correctly - Planning client receives filters but doesn't apply logic yet (next PR) - Spark re-applies filters as residuals (conservative approach ensures correctness) **No behavior changes to existing functionality:** - Zero impact on existing Delta behavior - Feature only activates when config flag enabled - This PR enables future catalog implementations to receive and process filter pushdown --- **Commit:** f5bc498 --------- Co-authored-by: Claude <[email protected]>
Updated ServerSidePlanningClient documentation to clarify the Filter Conversion Pattern: Spark Filter is the universal representation, and each catalog implementation converts to their own native format. Changes: - Added Filter Conversion Pattern section explaining catalog responsibilities - Enhanced filter parameter documentation with conversion examples - Clarified that Iceberg, Unity Catalog, and other catalogs each provide their own converters as private implementation details This is a documentation-only change with zero behavior changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Add projection parameter to ServerSidePlanningClient.planScan() - Implement SupportsPushDownRequiredColumns in ServerSidePlannedScanBuilder - Add projection capturing to PushdownCapturingTestClient - Add 3 projection passthrough tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
5b3a008 to
c93b9c5
Compare
tdas
approved these changes
Dec 13, 2025
Contributor
tdas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🥞 Stacked PR
Use this link to review all changes.
Stack:
[Server-Side Planning] Add projection pushdown infrastructure
This PR adds infrastructure for pushing down column projections (SELECT column list) to the server-side planning client.
What This PR Does
Add projection parameter to
planScan()interfaceprojection: Option[StructType]parameterImplement
SupportsPushDownRequiredColumnsinServerSidePlannedScanBuilderpruneColumns(requiredSchema)when columns are prunedPass projection through to
planScan()callExtend test infrastructure
TestServerSidePlanningClientnow captures both filter AND projectiongetCapturedFilter()andgetCapturedProjection()Files Changed
Modified Files (4 files)
spark/src/main/scala/.../ServerSidePlanningClient.scala- Add projection parameterspark/src/main/scala/.../ServerSidePlannedTable.scala- Implement SupportsPushDownRequiredColumnsspark/src/test/scala/.../TestServerSidePlanningClient.scala- Add projection capturingspark/src/test/scala/.../ServerSidePlannedTableSuite.scala- Add projection testsTests
Total: 11 tests (8 existing + 3 new projection tests)
New projection tests:
"projection pushed when selecting specific columns"- Verifies projection forSELECT id, name"no projection pushed when selecting all columns"- VerifiesNoneforSELECT *"projection and filter pushed together"- Verifies both work simultaneouslyExisting tests (from previous PRs):
Design Notes
Conservative approach: Like filter pushdown, we pass all required columns to the catalog but don't claim we've eliminated any. Spark will still validate the schema post-read. This ensures correctness while allowing catalogs to optimize (skip columns, read less data, etc.)
Interface design: Uses
StructType(Spark's standard schema representation) to remain catalog-agnostic. Each catalog can interpret the projection in their own way.