Skip to content

Conversation

@murali-db
Copy link
Contributor

@murali-db murali-db commented Dec 3, 2025

🥞 Stacked PR

Use this link to review all changes.

Stack:


Summary

Integrates server-side planning into DeltaCatalog.loadTable():

  • Detect Unity Catalog tables without credentials
  • Return ServerSidePlannedTable instead of normal DeltaTableV2
  • Add ENABLE_SERVER_SIDE_PLANNING config flag for testing
  • Move ServerSidePlannedTable to serverSidePlanning package

Decision Logic

ServerSidePlanning is used when:

  1. ENABLE_SERVER_SIDE_PLANNING config is true (force flag for testing)
  2. OR: Unity Catalog table without credentials (actual use case)

Key Changes

  • DeltaCatalog.loadTable(): Add ServerSidePlannedTable.tryCreate() call
  • ServerSidePlannedTable.shouldUseServerSidePlanning(): Decision logic
  • Package refactor: Move from catalog/ to serverSidePlanning/
  • Add DeltaSQLConf.ENABLE_SERVER_SIDE_PLANNING config

Testing

4 tests covering:

  • Full query execution through DeltaCatalog
  • Normal path unchanged when disabled
  • Decision logic for all scenarios
  • Read-only enforcement (INSERT fails)
spark/testOnly org.apache.spark.sql.delta.serverSidePlanning.ServerSidePlannedTableSuite

Note: Tests verified in fork. Upstream master has kernel-api compilation issues (pre-existing, not related to these changes).

"DeltaCatalog", "loadTable") {
try {
super.loadTable(ident) match {
val table = super.loadTable(ident)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeltaCatalog.scala was refactored to AbstractDeltaCatalog.scala (commit 156e41f) so the loadTable changes are in this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha

@murali-db
Copy link
Contributor Author

Rebased on latest master (now includes PR #5621 merge). The branch now has only 1 commit on top of the latest master.

@murali-db murali-db force-pushed the upstream/row2-deltacatalog-integration branch from f099779 to 62c2baa Compare December 10, 2025 12:27
@murali-db murali-db changed the title [Server-Side Planning] Integrate DeltaCatalog with ServerSidePlannedTable and add E2E tests [Server-Side Planning] Integrate DeltaCatalog with ServerSidePlannedTable and add tests Dec 10, 2025
…able and add E2E tests (#9)

* [Server-Side Planning] Integrate DeltaCatalog with ServerSidePlannedTable and add E2E tests

Implements DeltaCatalog integration to use ServerSidePlannedTable when Unity Catalog
tables lack credentials, with comprehensive end-to-end testing and improvements.

Key changes:
- Add loadTable() logic in DeltaCatalog to detect UC tables without credentials
- Implement hasCredentials() to check for credential properties in table metadata
- Add ENABLE_SERVER_SIDE_PLANNING config flag for testing
- Add comprehensive integration tests with reflection-based credential testing
- Add Spark source code references for Identifier namespace structure
- Improve test suite by removing redundant aggregation test
- Revert verbose documentation comments in ServerSidePlannedTable

Test coverage:
- E2E: Full stack integration with DeltaCatalog
- E2E: Verify normal path unchanged when feature disabled
- loadTable() decision logic with ENABLE_SERVER_SIDE_PLANNING config
  (tests 3 scenarios including UC without credentials via reflection)

See Spark's LookupCatalog, CatalogAndIdentifier and ResolveSessionCatalog
for Identifier namespace structure references.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Refactor PR9: Extract decision logic and improve test quality

Changes:
- Extracted shouldUseServerSidePlanning() method with boolean inputs
- Replaced reflection-based test with clean unit test
- Tests all input combinations without brittle reflection code
- Improved testability and maintainability

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Clean up ServerSidePlannedTable: Remove unnecessary helper function

- Remove misleading "Fallback" comment that didn't apply to all cases
- Inline create() function into tryCreate() to reduce indirection
- Simplify logic: directly handle client creation in try-catch

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove unnecessary logging and unused imports from ServerSidePlannedTable

- Remove conditional logging that differentiated between forced and fallback paths
- Remove unused imports: MDC and DeltaLogKeys

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove useless AutoCloseable implementation from ServerSidePlannedTable

The close() method was never called because:
- Spark's Table interface has no lifecycle hooks
- No code explicitly called close() on ServerSidePlannedTable instances
- No try-with-resources or Using() blocks wrapped it

HTTP connection cleanup happens via connection timeouts (30s) and JVM
finalization, making AutoCloseable purely ceremonial dead code.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Simplify verbose test suite comments

Reduced 20+ line formatted comments to simple 2-line descriptions.
The bullet-pointed lists were over-documenting obvious test structure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove useless forTesting() wrapper method

The forTesting() method was just a wrapper around 'new' that added no value.
Tests now directly instantiate ServerSidePlannedTable with the constructor.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix brittle test assertion for table capabilities

Removed assertion that table has exactly 1 capability, which would break
if we add streaming support (MICRO_BATCH_READ, CONTINUOUS_READ) or other
non-write capabilities later.

Now tests what actually matters: supports BATCH_READ, does NOT support
BATCH_WRITE.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove redundant SupportsWrite interface check in test

Testing !isInstanceOf[SupportsWrite] is redundant with checking
!capabilities.contains(BATCH_WRITE) because:
- BATCH_WRITE capability requires SupportsWrite interface
- Having SupportsWrite without BATCH_WRITE would violate Spark contract

The capability check tests the public API contract, which is sufficient.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove e2e/integration terminology from ServerSidePlanningSuite

Changed:
- Test names: removed "E2E:" prefix
- Database name: integration_db → test_db
- Table name: e2e_test → planning_test

These tests use mock clients, not external systems, so e2e/integration
terminology was misleading.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Merge test suites: keep existing ServerSidePlannedTableSuite, delete new ServerSidePlanningSuite

ServerSidePlanningSuite was added in this PR, while ServerSidePlannedTableSuite
existed before. Merged them by keeping the existing file and deleting the new one,
so the PR shows modification rather than deletion+addition.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Refactor tests and improve documentation

- Refactor ServerSidePlannedTableSuite: create database/table once in beforeAll()
- Use minimal early-return pattern in DeltaCatalog with oss-only markers
- Move current snapshot limitation docs to ServerSidePlanningClient interface
- Add UC credential injection link to hasCredentials() documentation
- Lowercase test names and remove redundant client verification

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Simplify current snapshot limitation documentation

Shorten documentation from 4 lines to 1 concise line.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Address PR9 review feedback

Changes:
1. Remove unnecessary afterEach() cleanup - no resource leaks to prevent
2. Make test 2 explicit by setting config=false instead of relying on cleanup
3. Remove redundant test "loadTable() decision logic" - already covered by other tests
4. Add explanation for deltahadoopconfiguration scalastyle suppression

Tests: 4/4 passing, scalastyle clean

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Address PR9 review comments: improve test quality and cleanup

- Add withServerSidePlanningEnabled helper method to prevent test pollution
  - Encapsulates factory and config setup/teardown in one place
  - Guarantees cleanup in finally block
  - Prevents tests from interfering with each other

- Replace white-box capability check with black-box insert test
  - Test actual insert behavior instead of inspecting capabilities
  - Verifies inserts succeed without SSP, fail with SSP enabled
  - More realistic end-to-end test of read-only behavior

- Remove OSS-only marker comments from DeltaCatalog
  - Clean up // oss-only-start and // oss-only-end comments

- Remove unused import (DeltaCatalog)

All tests passing (4/4):
- full query through DeltaCatalog with server-side planning
- verify normal path unchanged when feature disabled
- shouldUseServerSidePlanning() decision logic
- ServerSidePlannedTable is read-only

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: Claude <[email protected]>
Comment on lines +118 to +122
val catalogName = if (ident.namespace().length > 1) {
ident.namespace().head
} else {
"spark_catalog"
}
Copy link
Contributor

@tdas tdas Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note for future PR: we should test for table/catalog/schema names that have special chars like hyphens. we found the hard way in uc-spark connector when a user complained.

@tdas tdas merged commit 65ff57f into delta-io:master Dec 10, 2025
14 checks passed
TimothyW553 pushed a commit to TimothyW553/delta that referenced this pull request Dec 10, 2025
…able and add tests (delta-io#5622)

## 🥞 Stacked PR
Use this [link](https://github.com/delta-io/delta/pull/5622/files) to
review all changes.

**Stack:**
- **[Integrate DeltaCatalog with ServerSidePlannedTable and add
tests](delta-io#5622 [[Files
changed](https://github.com/delta-io/delta/pull/5622/files)] ⬅️ _This
PR_
- [Add metadata abstraction and factory
pattern](delta-io#5671) [[Files
changed](https://github.com/delta-io/delta/pull/5671/files/c94cf91814aa5bf3080f5191f75535f0cc5d2d51..f06cc6576c63ab1075bace94fb88d1a4a64ef308)]

---

## Summary

Integrates server-side planning into DeltaCatalog.loadTable():
- Detect Unity Catalog tables without credentials
- Return ServerSidePlannedTable instead of normal DeltaTableV2
- Add ENABLE_SERVER_SIDE_PLANNING config flag for testing
- Move ServerSidePlannedTable to serverSidePlanning package

## Decision Logic

ServerSidePlanning is used when:
1. `ENABLE_SERVER_SIDE_PLANNING` config is true (force flag for testing)
2. OR: Unity Catalog table without credentials (actual use case)

## Key Changes

- `DeltaCatalog.loadTable()`: Add ServerSidePlannedTable.tryCreate()
call
- `ServerSidePlannedTable.shouldUseServerSidePlanning()`: Decision logic
- Package refactor: Move from catalog/ to serverSidePlanning/
- Add `DeltaSQLConf.ENABLE_SERVER_SIDE_PLANNING` config

## Testing

4 tests covering:
- Full query execution through DeltaCatalog
- Normal path unchanged when disabled
- Decision logic for all scenarios
- Read-only enforcement (INSERT fails)

```
spark/testOnly org.apache.spark.sql.delta.serverSidePlanning.ServerSidePlannedTableSuite
```

Note: Tests verified in fork. Upstream master has kernel-api compilation
issues (pre-existing, not related to these changes).

Co-authored-by: Claude <[email protected]>
tdas pushed a commit that referenced this pull request Dec 11, 2025
…5671)

## 🥞 Stacked PR
Use this [link](https://github.com/delta-io/delta/pull/5671/files) to
review all changes.

**Stack:**
- [Integrate DeltaCatalog with ServerSidePlannedTable and add
tests](#5622) [[Files
changed](https://github.com/delta-io/delta/pull/5622/files)] ✅ Merged
- **[Add metadata abstraction and factory
pattern](#5671 [[Files
changed](https://github.com/delta-io/delta/pull/5671/files)] ⬅️ _This
PR_
- [Add filter pushdown
infrastructure](#5672) [[Files
changed](https://github.com/delta-io/delta/pull/5672/files/6d76fced25e7e69053d96c5101f511184e1d0d49..fb69ce1801da7aee5b633ad9103d528ee14bac1e)]

---

## Summary

Introduces metadata abstraction pattern for ServerSidePlannedTable:
- `ServerSidePlanningMetadata` trait for catalog-specific configuration
- Factory pattern changed from `buildForCatalog()` to
`buildClient(metadata)`
- Default implementation for non-UC catalogs

## Key Changes

**New file:**
- `ServerSidePlanningMetadata.scala` - Trait + DefaultMetadata + factory
method

**Modified files:**
- `ServerSidePlanningClient.scala` - Update factory interface to accept
metadata
- `ServerSidePlannedTable.scala` - Use metadata factory, pass metadata
to client builder
- `ServerSidePlannedTableSuite.scala` - Test fromTable() method

## Architecture

```
Table Properties (from loadTable)
        ↓
ServerSidePlanningMetadata.fromTable()
        ↓
   DefaultMetadata (for now)
   [Future: catalog-specific implementations]
        ↓
ServerSidePlanningClientFactory.buildClient(metadata)
        ↓
ServerSidePlanningClient implementation
```

## Design Principles

### 1. Metadata as Interface

`ServerSidePlanningMetadata` trait captures what's needed to create a
planning client:
- `planningEndpointUri` - REST endpoint (empty for default)
- `authToken` - Authentication token (None for default)
- `catalogName` - For configuration lookups
- `tableProperties` - Additional table properties

### 2. Factory Method for Metadata

`ServerSidePlanningMetadata.fromTable()` extracts metadata from loaded
table:
- Currently returns DefaultMetadata for all catalogs
- Future PRs will add catalog-specific implementations

### 3. Package-Private Implementation

Trait and classes are `private[serverSidePlanning]` - implementation
details hidden from outside.

## Testing

6 tests covering:
- Full query execution through DeltaCatalog
- Decision logic for server-side planning
- **fromTable() returns metadata with defaults** (updated)
- ServerSidePlannedTable is read-only
- Normal path unchanged when feature disabled

## Behavior

**No functional changes:**
- Same tables use server-side planning
- Same query execution flow
- Zero impact on existing Delta behavior

---

**Commit:** 6d76fce

Co-authored-by: Claude <[email protected]>
tdas pushed a commit that referenced this pull request Dec 13, 2025
## Stacked PR
Use this [link](https://github.com/delta-io/delta/pull/5672/files) to
review all changes.

**Stack:**
- [Integrate DeltaCatalog with ServerSidePlannedTable and add
tests](#5622) [[Files
changed](https://github.com/delta-io/delta/pull/5622/files)]
- [Add metadata abstraction and factory
pattern](#5671) [[Files
changed](https://github.com/delta-io/delta/pull/5671/files)]
- **[Add filter pushdown
infrastructure](#5672 [[Files
changed](https://github.com/delta-io/delta/pull/5672/files)] <-- _This
PR_
- [Add projection pushdown
infrastructure](#5685) [[Files
changed](https://github.com/delta-io/delta/pull/5685/files/f5bc498a47be8fc7457269939d510a2ccaba52ba..5b3a008a94a7d35f91868e93205b75fc65569ebb)]

---

## Summary

Adds generic filter pushdown infrastructure to ServerSidePlanningClient:
- Add filter parameter to **ServerSidePlanningClient.planScan()**
interface
- Use Spark's **Filter** type as catalog-agnostic representation
- Implement **SupportsPushDownFilters** in ServerSidePlannedScanBuilder
- Capture filter in **TestServerSidePlanningClient** companion object
for test verification
- Tests verifying filters are passed through to planning client
correctly

## Key Changes

**Modified files (4):**
- `ServerSidePlanningClient.scala` - Added filter parameter
- `ServerSidePlannedTable.scala` - Implements SupportsPushDownFilters,
passes filters through to planning client
- `TestServerSidePlanningClient.scala` - Added companion object to
capture filter for test verification
- `ServerSidePlannedTableSuite.scala` - Added 3 filter passthrough tests

## Design

### Conservative Filter Handling

We return all filters as "residuals" (meaning Spark will re-apply them
after the catalog returns results). This is conservative but correct:
- We don't know yet what the catalog can handle
- Better to redundantly filter (slow but correct) than claim we handle
filters we don't (fast but wrong)
- Future PRs will add catalog capabilities to avoid redundant filtering

### Filter Combining

When multiple filters are pushed (e.g., `WHERE id > 1 AND value < 30`),
we combine them into a single And filter before sending to the planning
client. This keeps the interface simple (single Option[Filter] instead
of Array[Filter]).

## Testing

8 tests total (spark module only):
- Full query execution through DeltaCatalog
- Decision logic for server-side planning
- ServerSidePlanningMetadata.fromTable() returns metadata with defaults
- ServerSidePlannedTable is read-only
- Normal path unchanged when feature disabled
- **Simple EqualTo filter (WHERE id = 2)**
- **Compound And filter (WHERE id > 1 AND value < 30)**
- **No filter when no WHERE clause**

## Behavior

**Filter passthrough working:**
- Filters from SQL WHERE clauses are captured and passed to planning
client
- Test infrastructure validates filter objects are passed correctly
- Planning client receives filters but doesn't apply logic yet (next PR)
- Spark re-applies filters as residuals (conservative approach ensures
correctness)

**No behavior changes to existing functionality:**
- Zero impact on existing Delta behavior
- Feature only activates when config flag enabled
- This PR enables future catalog implementations to receive and process
filter pushdown

---

**Commit:** f5bc498

---------

Co-authored-by: Claude <[email protected]>
tdas pushed a commit that referenced this pull request Dec 13, 2025
## 🥞 Stacked PR
Use this [link](https://github.com/delta-io/delta/pull/5685/files) to
review all changes.

**Stack:**
- [#5621 - ServerSidePlanningClient interface and DSv2
table](#5621) [[Files
changed](https://github.com/delta-io/delta/pull/5621/files)] - ✅ Merged
- [#5622 - Integrate DeltaCatalog with
ServerSidePlanning](#5622) [[Files
changed](https://github.com/delta-io/delta/pull/5622/files)] - ✅ Merged
- [#5671 - Add metadata abstraction and factory
pattern](#5671) [[Files
changed](https://github.com/delta-io/delta/pull/5671/files)] - ✅ Merged
- [#5672 - Add filter pushdown
infrastructure](#5672) [[Files
changed](https://github.com/delta-io/delta/pull/5672/files)] - ✅ Merged
- **[#5685 - Add projection pushdown
infrastructure](#5685 [[Files
changed](https://github.com/delta-io/delta/pull/5685/files)] ⬅️ _This
PR_

---

# [Server-Side Planning] Add projection pushdown infrastructure

This PR adds infrastructure for pushing down column projections (SELECT
column list) to the server-side planning client.

## What This PR Does

1. **Add projection parameter to `planScan()` interface**
   - New optional `projection: Option[StructType]` parameter
   - Allows catalogs to receive required column information

2. **Implement `SupportsPushDownRequiredColumns` in
`ServerSidePlannedScanBuilder`**
   - Spark calls `pruneColumns(requiredSchema)` when columns are pruned
   - Stores required schema to pass to server

3. **Pass projection through to `planScan()` call**
   - Only send projection if columns are actually pruned (not SELECT *)
   - Catalog can use this to optimize file reading (skip columns, etc.)

4. **Extend test infrastructure**
- `TestServerSidePlanningClient` now captures both filter AND projection
- Companion object provides `getCapturedFilter()` and
`getCapturedProjection()`
   - Added 3 new tests for projection pushdown

## Files Changed

### Modified Files (4 files)
- `spark/src/main/scala/.../ServerSidePlanningClient.scala` - Add
projection parameter
- `spark/src/main/scala/.../ServerSidePlannedTable.scala` - Implement
SupportsPushDownRequiredColumns
- `spark/src/test/scala/.../TestServerSidePlanningClient.scala` - Add
projection capturing
- `spark/src/test/scala/.../ServerSidePlannedTableSuite.scala` - Add
projection tests

## Tests

Total: **11 tests** (8 existing + 3 new projection tests)

**New projection tests:**
- `"projection pushed when selecting specific columns"` - Verifies
projection for `SELECT id, name`
- `"no projection pushed when selecting all columns"` - Verifies `None`
for `SELECT *`
- `"projection and filter pushed together"` - Verifies both work
simultaneously

**Existing tests** (from previous PRs):
- Simple EqualTo filter
- Compound And filter
- No filter when no WHERE clause
- Full query through DeltaCatalog
- Normal path unchanged when disabled
- Decision logic tests
- Read-only verification
- Metadata factory test

## Design Notes

**Conservative approach:** Like filter pushdown, we pass all required
columns to the catalog but don't claim we've eliminated any. Spark will
still validate the schema post-read. This ensures correctness while
allowing catalogs to optimize (skip columns, read less data, etc.)

**Interface design:** Uses `StructType` (Spark's standard schema
representation) to remain catalog-agnostic. Each catalog can interpret
the projection in their own way.

---------

Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants