Skip to content

Conversation

@murali-db
Copy link
Contributor

Summary

Adds core infrastructure for server-side table scan planning:

  • ServerSidePlanningClient interface for remote scan planning
  • ServerSidePlannedTable DSv2 table implementation
  • TestServerSidePlanningClient mock client using Spark for testing
  • Factory pattern for building planning clients

This enables Delta Lake to delegate file discovery to external catalog services (e.g., Unity Catalog) for tables with fine-grained access control.

Key Changes

  • Interface uses simple data classes (ScanFile, ScanPlan) with no external dependencies
  • ServerSidePlannedTable implements DSv2 Table/Scan/Batch interfaces
  • Tests use TestServerSidePlanningClient (Spark-based mock) - no external services needed
  • Read-only table (BATCH_READ capability only)

Testing

Spark module compiles and tests pass:
```
spark/testOnly org.apache.spark.sql.delta.serverSidePlanning.ServerSidePlannedTableSuite
```

Note: Upstream master currently has kernel-api compilation issues (pre-existing). Our spark-only changes compile successfully.

Stack Position

This is the first PR in a stack. Please review in order.

)

* Add ServerSidePlanningClient interface and test implementations

Core interface for server-side scan planning:
- ServerSidePlanningClient trait with planScan method
- ServerSidePlanningClientFactory trait with createClient and buildForCatalog
- IcebergRESTCatalogPlanningClientFactory (uses reflection to load Iceberg impl)
- MockServerSidePlanningClient for testing
- ServerSidePlanningTestClient using Spark for file discovery

The interface is intentionally simple with no Iceberg dependencies,
allowing it to live in the delta-spark module. The buildForCatalog method
reads catalog-specific configuration from spark.sql.catalog.<name>.uri/token.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add ServerSidePlannedTable DSv2 table implementation

Implements DSv2 Table/Scan/Batch interfaces for server-side planning:
- ServerSidePlannedTable: DSv2 table backed by server-side planning client
- ServerSidePlannedScan/Batch: Scan and batch implementations
- ServerSidePlannedFilePartition: Custom InputPartition for file-based reads
- ServerSidePlanningTestClient: Test client using Spark for file discovery

This allows Delta to read tables using file lists from remote planning
services (e.g., Iceberg REST catalog) without local file discovery.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Rename namespace to database for consistency

Rename 'namespace' parameter to 'database' throughout ServerSidePlannedTable
and its related classes to match the naming in ServerSidePlanningClient.planScan().

Also update tests to use the new parameter names and fix references to removed
schema field in ScanPlan and renamed factory class.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix scalastyle: use Locale.ROOT for toLowerCase

Use toLowerCase(Locale.ROOT) instead of toLowerCase() to ensure
consistent behavior regardless of default locale.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove redundant mock client tests

The mock client tests only verified DSv2 API wiring without reading
actual data. The end-to-end tests with ServerSidePlanningTestClient
cover the same functionality plus actual data reading, making the
mock tests redundant.

Removed:
- MockServerSidePlanningClient and MockServerSidePlanningClientFactory
- All unit tests using mock client
- Updated factory registry test to use test client instead

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Rename ServerSidePlanningTestClient to TestServerSidePlanningClient

Renamed for consistency with naming conventions (Test prefix).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove createClient method and consolidate on buildForCatalog

Having two methods with different names (createClient vs buildForCatalog)
was confusing. Removed createClient and updated all tests to use
buildForCatalog("spark_catalog") instead for consistency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Pass SparkSession and DeltaLog as parameters, use deltaLog.newDeltaHadoopConf()

Following the DeltaTableV2 pattern, ServerSidePlannedTable now accepts
SparkSession and DeltaLog as constructor parameters instead of calling
SparkSession.active internally. This ensures proper initialization and
follows Delta Lake conventions.

Changed ServerSidePlannedFilePartitionReaderFactory to use
deltaLog.newDeltaHadoopConf() instead of sessionState.newHadoopConf().
This properly includes DataFrame options like custom S3 credentials that
users may pass in, matching the behavior of other Delta code.

Updated tests to use Delta tables (required for DeltaLog) instead of
plain Parquet tables.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove .bazelbsp files

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add comment explaining fully qualified name in ServerSidePlannedTable.name()

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Update comment to clarify test client is for testing without real server-side planning

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove redundant unit test and consolidate into comprehensive integration test

The first test was testing direct instantiation of ServerSidePlannedTable,
which never happens in production. Removed it and added the detailed
assertions (file format check, partition count) to the integration test
that goes through the actual production code path via DeltaCatalog.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove unused forceServerSidePlanning config from test

This config is not implemented or checked anywhere in the production code.
The test works by setting the ServerSidePlanningClientFactory directly.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Use checkAnswer pattern instead of manual assertions

Replaced manual row-by-row assertions with QueryTest.checkAnswer(),
which is the standard pattern in Spark tests for comparing query results.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix line length scalastyle violation

Split long line in newScanBuilder to comply with 100 character limit.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix DeltaLog.forTable method call syntax

Use the simpler overload that takes catalogTable directly instead of
path with named parameter.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix test to create table and insert data before configuring DeltaCatalog

ServerSidePlannedTable only supports reads (SupportsRead), not writes.
Restructured test to:
1. Create table and INSERT data using default catalog (supports writes)
2. Configure DeltaCatalog and test factory
3. Execute SELECT through ServerSidePlannedTable (read-only)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Fix try-finally block nesting syntax error

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove DeltaLog dependency, use Parquet tables, and consolidate tests

- Remove DeltaLog parameter from ServerSidePlannedTable and related classes
- Accept SparkSession as constructor parameter (following DeltaTableV2 pattern)
- Use sessionState.newHadoopConf() with scalastyle suppression and comment explaining the limitation
- Change tests from Delta to Parquet tables (no longer need DeltaLog)
- Merge two tests into one comprehensive end-to-end test with fine-grained assertions
- Fix URL decoding issue in TestServerSidePlanningClient for input_file_name() paths
- Add detailed assertions for factory registry, file discovery, table metadata, and reader creation

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Remove IcebergRESTCatalogPlanningClientFactory from interface PR

Move IcebergRESTCatalogPlanningClientFactory to PR B where the actual
Iceberg implementation lives. Update factory registry to require explicit
registration instead of defaulting to Iceberg factory.

This makes PR A self-contained with no dependencies on unimplemented classes.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>

* Add explanatory comments for caselocale scalastyle suppressions

---------

Co-authored-by: Claude <[email protected]>
@murali-db murali-db force-pushed the upstream/row1-interface-and-table branch from 57d8ee1 to a5acb16 Compare December 4, 2025 17:44
- Update copyright year from 2021 to 2025 in all files
- Rename 'database' parameter to 'databaseName' for consistency
- Rename 'schema' parameter to 'tableSchema' in PartitionReaderFactory
- Move ServerSidePlannedTable to serverSidePlanning package
- Update Hadoop config comment to clarify intentional design
- Make factory classes private[serverSidePlanning]
- Simplify factory API: replace getFactory() and buildForCatalog() with single getClient() method
- Remove unnecessary docstring line from TestServerSidePlanningClient
- Simplify getFileFormat() to return only 'parquet' for test code
- Update imports in ServerSidePlannedTableSuite
@murali-db murali-db force-pushed the upstream/row1-interface-and-table branch from a5acb16 to 3167356 Compare December 4, 2025 17:58
Update ServerSidePlannedTableSuite to match API changes:
- Remove getFactory() assertion (method made package-private)
- Replace buildForCatalog() with getClient()
- Rename database parameter to databaseName

Addresses CI compilation errors after addressing TD's review comments.
@murali-db murali-db requested a review from tdas December 5, 2025 16:38
@tdas tdas merged commit 8a1347f into delta-io:master Dec 9, 2025
20 checks passed
TimothyW553 pushed a commit to TimothyW553/delta that referenced this pull request Dec 10, 2025
…2 table implementation (delta-io#5621)

## Summary

Adds core infrastructure for server-side table scan planning:
- **ServerSidePlanningClient** interface for remote scan planning
- **ServerSidePlannedTable** DSv2 table implementation
- **TestServerSidePlanningClient** mock client using Spark for testing
- Factory pattern for building planning clients

This enables Delta Lake to delegate file discovery to external catalog
services (e.g., Unity Catalog) for tables with fine-grained access
control.

## Key Changes

- Interface uses simple data classes (ScanFile, ScanPlan) with no
external dependencies
- ServerSidePlannedTable implements DSv2 Table/Scan/Batch interfaces
- Tests use TestServerSidePlanningClient (Spark-based mock) - no
external services needed
- Read-only table (BATCH_READ capability only)

## Testing

Spark module compiles and tests pass:
\`\`\`
spark/testOnly
org.apache.spark.sql.delta.serverSidePlanning.ServerSidePlannedTableSuite
\`\`\`

Note: Upstream master currently has kernel-api compilation issues
(pre-existing). Our spark-only changes compile successfully.

## Stack Position

This is the first PR in a stack. Please review in order.

---------

Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants