Skip to content

Conversation

@murali-db
Copy link
Contributor

@murali-db murali-db commented Dec 12, 2025

🥞 Stacked PR

Use this link to review all changes.

Stack:


[Server-Side Planning] Add projection pushdown infrastructure

This PR adds infrastructure for pushing down column projections (SELECT column list) to the server-side planning client.

What This PR Does

  1. Add projection parameter to planScan() interface

    • New optional projection: Option[StructType] parameter
    • Allows catalogs to receive required column information
  2. Implement SupportsPushDownRequiredColumns in ServerSidePlannedScanBuilder

    • Spark calls pruneColumns(requiredSchema) when columns are pruned
    • Stores required schema to pass to server
  3. Pass projection through to planScan() call

    • Only send projection if columns are actually pruned (not SELECT *)
    • Catalog can use this to optimize file reading (skip columns, etc.)
  4. Extend test infrastructure

    • TestServerSidePlanningClient now captures both filter AND projection
    • Companion object provides getCapturedFilter() and getCapturedProjection()
    • Added 3 new tests for projection pushdown

Files Changed

Modified Files (4 files)

  • spark/src/main/scala/.../ServerSidePlanningClient.scala - Add projection parameter
  • spark/src/main/scala/.../ServerSidePlannedTable.scala - Implement SupportsPushDownRequiredColumns
  • spark/src/test/scala/.../TestServerSidePlanningClient.scala - Add projection capturing
  • spark/src/test/scala/.../ServerSidePlannedTableSuite.scala - Add projection tests

Tests

Total: 11 tests (8 existing + 3 new projection tests)

New projection tests:

  • "projection pushed when selecting specific columns" - Verifies projection for SELECT id, name
  • "no projection pushed when selecting all columns" - Verifies None for SELECT *
  • "projection and filter pushed together" - Verifies both work simultaneously

Existing tests (from previous PRs):

  • Simple EqualTo filter
  • Compound And filter
  • No filter when no WHERE clause
  • Full query through DeltaCatalog
  • Normal path unchanged when disabled
  • Decision logic tests
  • Read-only verification
  • Metadata factory test

Design Notes

Conservative approach: Like filter pushdown, we pass all required columns to the catalog but don't claim we've eliminated any. Spark will still validate the schema post-read. This ensures correctness while allowing catalogs to optimize (skip columns, read less data, etc.)

Interface design: Uses StructType (Spark's standard schema representation) to remain catalog-agnostic. Each catalog can interpret the projection in their own way.

@murali-db murali-db force-pushed the upstream/row5-projection-infrastructure branch 8 times, most recently from c0a6a25 to 5b3a008 Compare December 12, 2025 14:25
tdas pushed a commit that referenced this pull request Dec 13, 2025
## Stacked PR
Use this [link](https://github.com/delta-io/delta/pull/5672/files) to
review all changes.

**Stack:**
- [Integrate DeltaCatalog with ServerSidePlannedTable and add
tests](#5622) [[Files
changed](https://github.com/delta-io/delta/pull/5622/files)]
- [Add metadata abstraction and factory
pattern](#5671) [[Files
changed](https://github.com/delta-io/delta/pull/5671/files)]
- **[Add filter pushdown
infrastructure](#5672 [[Files
changed](https://github.com/delta-io/delta/pull/5672/files)] <-- _This
PR_
- [Add projection pushdown
infrastructure](#5685) [[Files
changed](https://github.com/delta-io/delta/pull/5685/files/f5bc498a47be8fc7457269939d510a2ccaba52ba..5b3a008a94a7d35f91868e93205b75fc65569ebb)]

---

## Summary

Adds generic filter pushdown infrastructure to ServerSidePlanningClient:
- Add filter parameter to **ServerSidePlanningClient.planScan()**
interface
- Use Spark's **Filter** type as catalog-agnostic representation
- Implement **SupportsPushDownFilters** in ServerSidePlannedScanBuilder
- Capture filter in **TestServerSidePlanningClient** companion object
for test verification
- Tests verifying filters are passed through to planning client
correctly

## Key Changes

**Modified files (4):**
- `ServerSidePlanningClient.scala` - Added filter parameter
- `ServerSidePlannedTable.scala` - Implements SupportsPushDownFilters,
passes filters through to planning client
- `TestServerSidePlanningClient.scala` - Added companion object to
capture filter for test verification
- `ServerSidePlannedTableSuite.scala` - Added 3 filter passthrough tests

## Design

### Conservative Filter Handling

We return all filters as "residuals" (meaning Spark will re-apply them
after the catalog returns results). This is conservative but correct:
- We don't know yet what the catalog can handle
- Better to redundantly filter (slow but correct) than claim we handle
filters we don't (fast but wrong)
- Future PRs will add catalog capabilities to avoid redundant filtering

### Filter Combining

When multiple filters are pushed (e.g., `WHERE id > 1 AND value < 30`),
we combine them into a single And filter before sending to the planning
client. This keeps the interface simple (single Option[Filter] instead
of Array[Filter]).

## Testing

8 tests total (spark module only):
- Full query execution through DeltaCatalog
- Decision logic for server-side planning
- ServerSidePlanningMetadata.fromTable() returns metadata with defaults
- ServerSidePlannedTable is read-only
- Normal path unchanged when feature disabled
- **Simple EqualTo filter (WHERE id = 2)**
- **Compound And filter (WHERE id > 1 AND value < 30)**
- **No filter when no WHERE clause**

## Behavior

**Filter passthrough working:**
- Filters from SQL WHERE clauses are captured and passed to planning
client
- Test infrastructure validates filter objects are passed correctly
- Planning client receives filters but doesn't apply logic yet (next PR)
- Spark re-applies filters as residuals (conservative approach ensures
correctness)

**No behavior changes to existing functionality:**
- Zero impact on existing Delta behavior
- Feature only activates when config flag enabled
- This PR enables future catalog implementations to receive and process
filter pushdown

---

**Commit:** f5bc498

---------

Co-authored-by: Claude <[email protected]>
murali-db and others added 2 commits December 13, 2025 00:13
Updated ServerSidePlanningClient documentation to clarify the Filter
Conversion Pattern: Spark Filter is the universal representation, and
each catalog implementation converts to their own native format.

Changes:
- Added Filter Conversion Pattern section explaining catalog responsibilities
- Enhanced filter parameter documentation with conversion examples
- Clarified that Iceberg, Unity Catalog, and other catalogs each provide
  their own converters as private implementation details

This is a documentation-only change with zero behavior changes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Add projection parameter to ServerSidePlanningClient.planScan()
- Implement SupportsPushDownRequiredColumns in ServerSidePlannedScanBuilder
- Add projection capturing to PushdownCapturingTestClient
- Add 3 projection passthrough tests

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@murali-db murali-db force-pushed the upstream/row5-projection-infrastructure branch from 5b3a008 to c93b9c5 Compare December 13, 2025 00:15
Copy link
Contributor

@tdas tdas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tdas tdas merged commit 9a2fc89 into delta-io:master Dec 13, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants