Skip to content

Conversation

@murali-db
Copy link
Contributor

@murali-db murali-db commented Dec 10, 2025

Stacked PR

Use this link to review all changes.

Stack:


Summary

Adds generic filter pushdown infrastructure to ServerSidePlanningClient:

  • Add filter parameter to ServerSidePlanningClient.planScan() interface
  • Use Spark's Filter type as catalog-agnostic representation
  • Implement SupportsPushDownFilters in ServerSidePlannedScanBuilder
  • Capture filter in TestServerSidePlanningClient companion object for test verification
  • Tests verifying filters are passed through to planning client correctly

Key Changes

Modified files (4):

  • ServerSidePlanningClient.scala - Added filter parameter
  • ServerSidePlannedTable.scala - Implements SupportsPushDownFilters, passes filters through to planning client
  • TestServerSidePlanningClient.scala - Added companion object to capture filter for test verification
  • ServerSidePlannedTableSuite.scala - Added 3 filter passthrough tests

Design

Conservative Filter Handling

We return all filters as "residuals" (meaning Spark will re-apply them after the catalog returns results). This is conservative but correct:

  • We don't know yet what the catalog can handle
  • Better to redundantly filter (slow but correct) than claim we handle filters we don't (fast but wrong)
  • Future PRs will add catalog capabilities to avoid redundant filtering

Filter Combining

When multiple filters are pushed (e.g., WHERE id > 1 AND value < 30), we combine them into a single And filter before sending to the planning client. This keeps the interface simple (single Option[Filter] instead of Array[Filter]).

Testing

8 tests total (spark module only):

  • Full query execution through DeltaCatalog
  • Decision logic for server-side planning
  • ServerSidePlanningMetadata.fromTable() returns metadata with defaults
  • ServerSidePlannedTable is read-only
  • Normal path unchanged when feature disabled
  • Simple EqualTo filter (WHERE id = 2)
  • Compound And filter (WHERE id > 1 AND value < 30)
  • No filter when no WHERE clause

Behavior

Filter passthrough working:

  • Filters from SQL WHERE clauses are captured and passed to planning client
  • Test infrastructure validates filter objects are passed correctly
  • Planning client receives filters but doesn't apply logic yet (next PR)
  • Spark re-applies filters as residuals (conservative approach ensures correctness)

No behavior changes to existing functionality:

  • Zero impact on existing Delta behavior
  • Feature only activates when config flag enabled
  • This PR enables future catalog implementations to receive and process filter pushdown

Commit: f5bc498

@murali-db murali-db force-pushed the upstream/row4-filter-infrastructure-clean branch from 21dd3ca to c709f88 Compare December 10, 2025 17:27
@murali-db murali-db force-pushed the upstream/row4-filter-infrastructure-clean branch 2 times, most recently from 8fe0e82 to fb69ce1 Compare December 10, 2025 21:49
tdas pushed a commit that referenced this pull request Dec 11, 2025
…5671)

## 🥞 Stacked PR
Use this [link](https://github.com/delta-io/delta/pull/5671/files) to
review all changes.

**Stack:**
- [Integrate DeltaCatalog with ServerSidePlannedTable and add
tests](#5622) [[Files
changed](https://github.com/delta-io/delta/pull/5622/files)] ✅ Merged
- **[Add metadata abstraction and factory
pattern](#5671 [[Files
changed](https://github.com/delta-io/delta/pull/5671/files)] ⬅️ _This
PR_
- [Add filter pushdown
infrastructure](#5672) [[Files
changed](https://github.com/delta-io/delta/pull/5672/files/6d76fced25e7e69053d96c5101f511184e1d0d49..fb69ce1801da7aee5b633ad9103d528ee14bac1e)]

---

## Summary

Introduces metadata abstraction pattern for ServerSidePlannedTable:
- `ServerSidePlanningMetadata` trait for catalog-specific configuration
- Factory pattern changed from `buildForCatalog()` to
`buildClient(metadata)`
- Default implementation for non-UC catalogs

## Key Changes

**New file:**
- `ServerSidePlanningMetadata.scala` - Trait + DefaultMetadata + factory
method

**Modified files:**
- `ServerSidePlanningClient.scala` - Update factory interface to accept
metadata
- `ServerSidePlannedTable.scala` - Use metadata factory, pass metadata
to client builder
- `ServerSidePlannedTableSuite.scala` - Test fromTable() method

## Architecture

```
Table Properties (from loadTable)
        ↓
ServerSidePlanningMetadata.fromTable()
        ↓
   DefaultMetadata (for now)
   [Future: catalog-specific implementations]
        ↓
ServerSidePlanningClientFactory.buildClient(metadata)
        ↓
ServerSidePlanningClient implementation
```

## Design Principles

### 1. Metadata as Interface

`ServerSidePlanningMetadata` trait captures what's needed to create a
planning client:
- `planningEndpointUri` - REST endpoint (empty for default)
- `authToken` - Authentication token (None for default)
- `catalogName` - For configuration lookups
- `tableProperties` - Additional table properties

### 2. Factory Method for Metadata

`ServerSidePlanningMetadata.fromTable()` extracts metadata from loaded
table:
- Currently returns DefaultMetadata for all catalogs
- Future PRs will add catalog-specific implementations

### 3. Package-Private Implementation

Trait and classes are `private[serverSidePlanning]` - implementation
details hidden from outside.

## Testing

6 tests covering:
- Full query execution through DeltaCatalog
- Decision logic for server-side planning
- **fromTable() returns metadata with defaults** (updated)
- ServerSidePlannedTable is read-only
- Normal path unchanged when feature disabled

## Behavior

**No functional changes:**
- Same tables use server-side planning
- Same query execution flow
- Zero impact on existing Delta behavior

---

**Commit:** 6d76fce

Co-authored-by: Claude <[email protected]>
murali-db and others added 2 commits December 11, 2025 21:25
This PR adds the filter parameter to the ServerSidePlanningClient interface
but doesn't wire it up yet. All call sites pass None, maintaining current
behavior. This is infrastructure-only with zero behavior change.

Changes:
- Add filter parameter to ServerSidePlanningClient.planScan() signature
  - Uses Spark Filter type to keep spark module Iceberg-agnostic
- Update IcebergRESTCatalogPlanningClient to accept filter parameter
  - Commented-out code shows where converter will be wired in PR19
- Update TestServerSidePlanningClient to accept filter parameter
- Update all call sites (ServerSidePlannedTable, test suite) to pass None

Testing:
- All 15 existing tests pass with no behavior change
- spark/testOnly: 8/8 tests passed
- iceberg/testOnly: 7/7 tests passed

Next steps (PR19):
- Create SparkToIcebergExpressionConverter utility
- Add comprehensive unit tests for filter conversion
- Keep spark module independent of Iceberg types

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Updated ServerSidePlanningClient documentation to clarify the Filter
Conversion Pattern: Spark Filter is the universal representation, and
each catalog implementation converts to their own native format.

Changes:
- Added Filter Conversion Pattern section explaining catalog responsibilities
- Enhanced filter parameter documentation with conversion examples
- Clarified that Iceberg, Unity Catalog, and other catalogs each provide
  their own converters as private implementation details

This is a documentation-only change with zero behavior changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@murali-db murali-db force-pushed the upstream/row4-filter-infrastructure-clean branch 3 times, most recently from d0c1a79 to b367368 Compare December 12, 2025 11:31
@murali-db murali-db force-pushed the upstream/row4-filter-infrastructure-clean branch from b367368 to b65cff9 Compare December 12, 2025 12:06
This commit adds the testing infrastructure to verify that filters are properly
passed from SQL WHERE clauses through to the ServerSidePlanningClient.

Changes:
- Add FilterCapturingTestClient for capturing filters in tests
- Implement SupportsPushDownFilters in ServerSidePlannedScanBuilder
- Pass filters from ScanBuilder through to ServerSidePlannedScan
- Combine multiple filters with And before sending to planning client
- Add 3 tests verifying filter passthrough:
  - Simple EqualTo filter (WHERE id = 2)
  - Compound And filter (WHERE id > 1 AND value < 30)
  - No filter when no WHERE clause

Testing:
- 10 tests total (7 existing + 3 new)
- Verifies filters flow correctly from SQL to planning client
- Infrastructure ready for catalog-specific filter conversion (future PRs)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@murali-db murali-db force-pushed the upstream/row4-filter-infrastructure-clean branch 2 times, most recently from e6849f2 to c53e01a Compare December 12, 2025 13:23
- Add collectLeafFilters() as common helper method to flatten filter trees
- Import filter types (And, EqualTo, GreaterThan, LessThan) instead of using
  fully qualified names
- Remove duplicated inline collectFilters function from compound filter test

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@murali-db murali-db force-pushed the upstream/row4-filter-infrastructure-clean branch from c53e01a to f5bc498 Compare December 12, 2025 14:25
@murali-db murali-db requested a review from tdas December 13, 2025 00:08
Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tdas tdas merged commit 5511116 into delta-io:master Dec 13, 2025
14 checks passed
tdas pushed a commit that referenced this pull request Dec 13, 2025
## 🥞 Stacked PR
Use this [link](https://github.com/delta-io/delta/pull/5685/files) to
review all changes.

**Stack:**
- [#5621 - ServerSidePlanningClient interface and DSv2
table](#5621) [[Files
changed](https://github.com/delta-io/delta/pull/5621/files)] - ✅ Merged
- [#5622 - Integrate DeltaCatalog with
ServerSidePlanning](#5622) [[Files
changed](https://github.com/delta-io/delta/pull/5622/files)] - ✅ Merged
- [#5671 - Add metadata abstraction and factory
pattern](#5671) [[Files
changed](https://github.com/delta-io/delta/pull/5671/files)] - ✅ Merged
- [#5672 - Add filter pushdown
infrastructure](#5672) [[Files
changed](https://github.com/delta-io/delta/pull/5672/files)] - ✅ Merged
- **[#5685 - Add projection pushdown
infrastructure](#5685 [[Files
changed](https://github.com/delta-io/delta/pull/5685/files)] ⬅️ _This
PR_

---

# [Server-Side Planning] Add projection pushdown infrastructure

This PR adds infrastructure for pushing down column projections (SELECT
column list) to the server-side planning client.

## What This PR Does

1. **Add projection parameter to `planScan()` interface**
   - New optional `projection: Option[StructType]` parameter
   - Allows catalogs to receive required column information

2. **Implement `SupportsPushDownRequiredColumns` in
`ServerSidePlannedScanBuilder`**
   - Spark calls `pruneColumns(requiredSchema)` when columns are pruned
   - Stores required schema to pass to server

3. **Pass projection through to `planScan()` call**
   - Only send projection if columns are actually pruned (not SELECT *)
   - Catalog can use this to optimize file reading (skip columns, etc.)

4. **Extend test infrastructure**
- `TestServerSidePlanningClient` now captures both filter AND projection
- Companion object provides `getCapturedFilter()` and
`getCapturedProjection()`
   - Added 3 new tests for projection pushdown

## Files Changed

### Modified Files (4 files)
- `spark/src/main/scala/.../ServerSidePlanningClient.scala` - Add
projection parameter
- `spark/src/main/scala/.../ServerSidePlannedTable.scala` - Implement
SupportsPushDownRequiredColumns
- `spark/src/test/scala/.../TestServerSidePlanningClient.scala` - Add
projection capturing
- `spark/src/test/scala/.../ServerSidePlannedTableSuite.scala` - Add
projection tests

## Tests

Total: **11 tests** (8 existing + 3 new projection tests)

**New projection tests:**
- `"projection pushed when selecting specific columns"` - Verifies
projection for `SELECT id, name`
- `"no projection pushed when selecting all columns"` - Verifies `None`
for `SELECT *`
- `"projection and filter pushed together"` - Verifies both work
simultaneously

**Existing tests** (from previous PRs):
- Simple EqualTo filter
- Compound And filter
- No filter when no WHERE clause
- Full query through DeltaCatalog
- Normal path unchanged when disabled
- Decision logic tests
- Read-only verification
- Metadata factory test

## Design Notes

**Conservative approach:** Like filter pushdown, we pass all required
columns to the catalog but don't claim we've eliminated any. Spark will
still validate the schema post-read. This ensures correctness while
allowing catalogs to optimize (skip columns, read less data, etc.)

**Interface design:** Uses `StructType` (Spark's standard schema
representation) to remain catalog-agnostic. Each catalog can interpret
the projection in their own way.

---------

Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants