-
Notifications
You must be signed in to change notification settings - Fork 2k
[Server-Side Planning] Add filter pushdown infrastructure #5672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tdas
merged 4 commits into
delta-io:master
from
murali-db:upstream/row4-filter-infrastructure-clean
Dec 13, 2025
Merged
[Server-Side Planning] Add filter pushdown infrastructure #5672
tdas
merged 4 commits into
delta-io:master
from
murali-db:upstream/row4-filter-infrastructure-clean
Dec 13, 2025
+152
−9
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5 tasks
21dd3ca to
c709f88
Compare
8fe0e82 to
fb69ce1
Compare
tdas
pushed a commit
that referenced
this pull request
Dec 11, 2025
…5671) ## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/5671/files) to review all changes. **Stack:** - [Integrate DeltaCatalog with ServerSidePlannedTable and add tests](#5622) [[Files changed](https://github.com/delta-io/delta/pull/5622/files)] ✅ Merged - **[Add metadata abstraction and factory pattern](#5671 [[Files changed](https://github.com/delta-io/delta/pull/5671/files)] ⬅️ _This PR_ - [Add filter pushdown infrastructure](#5672) [[Files changed](https://github.com/delta-io/delta/pull/5672/files/6d76fced25e7e69053d96c5101f511184e1d0d49..fb69ce1801da7aee5b633ad9103d528ee14bac1e)] --- ## Summary Introduces metadata abstraction pattern for ServerSidePlannedTable: - `ServerSidePlanningMetadata` trait for catalog-specific configuration - Factory pattern changed from `buildForCatalog()` to `buildClient(metadata)` - Default implementation for non-UC catalogs ## Key Changes **New file:** - `ServerSidePlanningMetadata.scala` - Trait + DefaultMetadata + factory method **Modified files:** - `ServerSidePlanningClient.scala` - Update factory interface to accept metadata - `ServerSidePlannedTable.scala` - Use metadata factory, pass metadata to client builder - `ServerSidePlannedTableSuite.scala` - Test fromTable() method ## Architecture ``` Table Properties (from loadTable) ↓ ServerSidePlanningMetadata.fromTable() ↓ DefaultMetadata (for now) [Future: catalog-specific implementations] ↓ ServerSidePlanningClientFactory.buildClient(metadata) ↓ ServerSidePlanningClient implementation ``` ## Design Principles ### 1. Metadata as Interface `ServerSidePlanningMetadata` trait captures what's needed to create a planning client: - `planningEndpointUri` - REST endpoint (empty for default) - `authToken` - Authentication token (None for default) - `catalogName` - For configuration lookups - `tableProperties` - Additional table properties ### 2. Factory Method for Metadata `ServerSidePlanningMetadata.fromTable()` extracts metadata from loaded table: - Currently returns DefaultMetadata for all catalogs - Future PRs will add catalog-specific implementations ### 3. Package-Private Implementation Trait and classes are `private[serverSidePlanning]` - implementation details hidden from outside. ## Testing 6 tests covering: - Full query execution through DeltaCatalog - Decision logic for server-side planning - **fromTable() returns metadata with defaults** (updated) - ServerSidePlannedTable is read-only - Normal path unchanged when feature disabled ## Behavior **No functional changes:** - Same tables use server-side planning - Same query execution flow - Zero impact on existing Delta behavior --- **Commit:** 6d76fce Co-authored-by: Claude <[email protected]>
This PR adds the filter parameter to the ServerSidePlanningClient interface but doesn't wire it up yet. All call sites pass None, maintaining current behavior. This is infrastructure-only with zero behavior change. Changes: - Add filter parameter to ServerSidePlanningClient.planScan() signature - Uses Spark Filter type to keep spark module Iceberg-agnostic - Update IcebergRESTCatalogPlanningClient to accept filter parameter - Commented-out code shows where converter will be wired in PR19 - Update TestServerSidePlanningClient to accept filter parameter - Update all call sites (ServerSidePlannedTable, test suite) to pass None Testing: - All 15 existing tests pass with no behavior change - spark/testOnly: 8/8 tests passed - iceberg/testOnly: 7/7 tests passed Next steps (PR19): - Create SparkToIcebergExpressionConverter utility - Add comprehensive unit tests for filter conversion - Keep spark module independent of Iceberg types 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Updated ServerSidePlanningClient documentation to clarify the Filter Conversion Pattern: Spark Filter is the universal representation, and each catalog implementation converts to their own native format. Changes: - Added Filter Conversion Pattern section explaining catalog responsibilities - Enhanced filter parameter documentation with conversion examples - Clarified that Iceberg, Unity Catalog, and other catalogs each provide their own converters as private implementation details This is a documentation-only change with zero behavior changes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
d0c1a79 to
b367368
Compare
b367368 to
b65cff9
Compare
This commit adds the testing infrastructure to verify that filters are properly passed from SQL WHERE clauses through to the ServerSidePlanningClient. Changes: - Add FilterCapturingTestClient for capturing filters in tests - Implement SupportsPushDownFilters in ServerSidePlannedScanBuilder - Pass filters from ScanBuilder through to ServerSidePlannedScan - Combine multiple filters with And before sending to planning client - Add 3 tests verifying filter passthrough: - Simple EqualTo filter (WHERE id = 2) - Compound And filter (WHERE id > 1 AND value < 30) - No filter when no WHERE clause Testing: - 10 tests total (7 existing + 3 new) - Verifies filters flow correctly from SQL to planning client - Infrastructure ready for catalog-specific filter conversion (future PRs) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
e6849f2 to
c53e01a
Compare
- Add collectLeafFilters() as common helper method to flatten filter trees - Import filter types (And, EqualTo, GreaterThan, LessThan) instead of using fully qualified names - Remove duplicated inline collectFilters function from compound filter test 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
c53e01a to
f5bc498
Compare
tdas
reviewed
Dec 12, 2025
spark/src/main/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlannedTable.scala
Show resolved
Hide resolved
tdas
reviewed
Dec 12, 2025
spark/src/main/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlannedTable.scala
Show resolved
Hide resolved
tdas
approved these changes
Dec 13, 2025
Contributor
tdas
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
tdas
pushed a commit
that referenced
this pull request
Dec 13, 2025
## 🥞 Stacked PR Use this [link](https://github.com/delta-io/delta/pull/5685/files) to review all changes. **Stack:** - [#5621 - ServerSidePlanningClient interface and DSv2 table](#5621) [[Files changed](https://github.com/delta-io/delta/pull/5621/files)] - ✅ Merged - [#5622 - Integrate DeltaCatalog with ServerSidePlanning](#5622) [[Files changed](https://github.com/delta-io/delta/pull/5622/files)] - ✅ Merged - [#5671 - Add metadata abstraction and factory pattern](#5671) [[Files changed](https://github.com/delta-io/delta/pull/5671/files)] - ✅ Merged - [#5672 - Add filter pushdown infrastructure](#5672) [[Files changed](https://github.com/delta-io/delta/pull/5672/files)] - ✅ Merged - **[#5685 - Add projection pushdown infrastructure](#5685 [[Files changed](https://github.com/delta-io/delta/pull/5685/files)] ⬅️ _This PR_ --- # [Server-Side Planning] Add projection pushdown infrastructure This PR adds infrastructure for pushing down column projections (SELECT column list) to the server-side planning client. ## What This PR Does 1. **Add projection parameter to `planScan()` interface** - New optional `projection: Option[StructType]` parameter - Allows catalogs to receive required column information 2. **Implement `SupportsPushDownRequiredColumns` in `ServerSidePlannedScanBuilder`** - Spark calls `pruneColumns(requiredSchema)` when columns are pruned - Stores required schema to pass to server 3. **Pass projection through to `planScan()` call** - Only send projection if columns are actually pruned (not SELECT *) - Catalog can use this to optimize file reading (skip columns, etc.) 4. **Extend test infrastructure** - `TestServerSidePlanningClient` now captures both filter AND projection - Companion object provides `getCapturedFilter()` and `getCapturedProjection()` - Added 3 new tests for projection pushdown ## Files Changed ### Modified Files (4 files) - `spark/src/main/scala/.../ServerSidePlanningClient.scala` - Add projection parameter - `spark/src/main/scala/.../ServerSidePlannedTable.scala` - Implement SupportsPushDownRequiredColumns - `spark/src/test/scala/.../TestServerSidePlanningClient.scala` - Add projection capturing - `spark/src/test/scala/.../ServerSidePlannedTableSuite.scala` - Add projection tests ## Tests Total: **11 tests** (8 existing + 3 new projection tests) **New projection tests:** - `"projection pushed when selecting specific columns"` - Verifies projection for `SELECT id, name` - `"no projection pushed when selecting all columns"` - Verifies `None` for `SELECT *` - `"projection and filter pushed together"` - Verifies both work simultaneously **Existing tests** (from previous PRs): - Simple EqualTo filter - Compound And filter - No filter when no WHERE clause - Full query through DeltaCatalog - Normal path unchanged when disabled - Decision logic tests - Read-only verification - Metadata factory test ## Design Notes **Conservative approach:** Like filter pushdown, we pass all required columns to the catalog but don't claim we've eliminated any. Spark will still validate the schema post-read. This ensures correctness while allowing catalogs to optimize (skip columns, read less data, etc.) **Interface design:** Uses `StructType` (Spark's standard schema representation) to remain catalog-agnostic. Each catalog can interpret the projection in their own way. --------- Co-authored-by: Claude <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stacked PR
Use this link to review all changes.
Stack:
Summary
Adds generic filter pushdown infrastructure to ServerSidePlanningClient:
Key Changes
Modified files (4):
ServerSidePlanningClient.scala- Added filter parameterServerSidePlannedTable.scala- Implements SupportsPushDownFilters, passes filters through to planning clientTestServerSidePlanningClient.scala- Added companion object to capture filter for test verificationServerSidePlannedTableSuite.scala- Added 3 filter passthrough testsDesign
Conservative Filter Handling
We return all filters as "residuals" (meaning Spark will re-apply them after the catalog returns results). This is conservative but correct:
Filter Combining
When multiple filters are pushed (e.g.,
WHERE id > 1 AND value < 30), we combine them into a single And filter before sending to the planning client. This keeps the interface simple (single Option[Filter] instead of Array[Filter]).Testing
8 tests total (spark module only):
Behavior
Filter passthrough working:
No behavior changes to existing functionality:
Commit: f5bc498