-
Notifications
You must be signed in to change notification settings - Fork 2k
[Server-Side Planning] Add ServerSidePlanningClient interface and DSv2 table implementation #5621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
tdas
merged 3 commits into
delta-io:master
from
murali-db:upstream/row1-interface-and-table
Dec 9, 2025
Merged
[Server-Side Planning] Add ServerSidePlanningClient interface and DSv2 table implementation #5621
tdas
merged 3 commits into
delta-io:master
from
murali-db:upstream/row1-interface-and-table
Dec 9, 2025
+497
−0
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
) * Add ServerSidePlanningClient interface and test implementations Core interface for server-side scan planning: - ServerSidePlanningClient trait with planScan method - ServerSidePlanningClientFactory trait with createClient and buildForCatalog - IcebergRESTCatalogPlanningClientFactory (uses reflection to load Iceberg impl) - MockServerSidePlanningClient for testing - ServerSidePlanningTestClient using Spark for file discovery The interface is intentionally simple with no Iceberg dependencies, allowing it to live in the delta-spark module. The buildForCatalog method reads catalog-specific configuration from spark.sql.catalog.<name>.uri/token. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add ServerSidePlannedTable DSv2 table implementation Implements DSv2 Table/Scan/Batch interfaces for server-side planning: - ServerSidePlannedTable: DSv2 table backed by server-side planning client - ServerSidePlannedScan/Batch: Scan and batch implementations - ServerSidePlannedFilePartition: Custom InputPartition for file-based reads - ServerSidePlanningTestClient: Test client using Spark for file discovery This allows Delta to read tables using file lists from remote planning services (e.g., Iceberg REST catalog) without local file discovery. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Rename namespace to database for consistency Rename 'namespace' parameter to 'database' throughout ServerSidePlannedTable and its related classes to match the naming in ServerSidePlanningClient.planScan(). Also update tests to use the new parameter names and fix references to removed schema field in ScanPlan and renamed factory class. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Fix scalastyle: use Locale.ROOT for toLowerCase Use toLowerCase(Locale.ROOT) instead of toLowerCase() to ensure consistent behavior regardless of default locale. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Remove redundant mock client tests The mock client tests only verified DSv2 API wiring without reading actual data. The end-to-end tests with ServerSidePlanningTestClient cover the same functionality plus actual data reading, making the mock tests redundant. Removed: - MockServerSidePlanningClient and MockServerSidePlanningClientFactory - All unit tests using mock client - Updated factory registry test to use test client instead 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Rename ServerSidePlanningTestClient to TestServerSidePlanningClient Renamed for consistency with naming conventions (Test prefix). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Remove createClient method and consolidate on buildForCatalog Having two methods with different names (createClient vs buildForCatalog) was confusing. Removed createClient and updated all tests to use buildForCatalog("spark_catalog") instead for consistency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Pass SparkSession and DeltaLog as parameters, use deltaLog.newDeltaHadoopConf() Following the DeltaTableV2 pattern, ServerSidePlannedTable now accepts SparkSession and DeltaLog as constructor parameters instead of calling SparkSession.active internally. This ensures proper initialization and follows Delta Lake conventions. Changed ServerSidePlannedFilePartitionReaderFactory to use deltaLog.newDeltaHadoopConf() instead of sessionState.newHadoopConf(). This properly includes DataFrame options like custom S3 credentials that users may pass in, matching the behavior of other Delta code. Updated tests to use Delta tables (required for DeltaLog) instead of plain Parquet tables. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Remove .bazelbsp files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add comment explaining fully qualified name in ServerSidePlannedTable.name() 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Update comment to clarify test client is for testing without real server-side planning 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Remove redundant unit test and consolidate into comprehensive integration test The first test was testing direct instantiation of ServerSidePlannedTable, which never happens in production. Removed it and added the detailed assertions (file format check, partition count) to the integration test that goes through the actual production code path via DeltaCatalog. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Remove unused forceServerSidePlanning config from test This config is not implemented or checked anywhere in the production code. The test works by setting the ServerSidePlanningClientFactory directly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Use checkAnswer pattern instead of manual assertions Replaced manual row-by-row assertions with QueryTest.checkAnswer(), which is the standard pattern in Spark tests for comparing query results. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Fix line length scalastyle violation Split long line in newScanBuilder to comply with 100 character limit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Fix DeltaLog.forTable method call syntax Use the simpler overload that takes catalogTable directly instead of path with named parameter. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Fix test to create table and insert data before configuring DeltaCatalog ServerSidePlannedTable only supports reads (SupportsRead), not writes. Restructured test to: 1. Create table and INSERT data using default catalog (supports writes) 2. Configure DeltaCatalog and test factory 3. Execute SELECT through ServerSidePlannedTable (read-only) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Fix try-finally block nesting syntax error 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Remove DeltaLog dependency, use Parquet tables, and consolidate tests - Remove DeltaLog parameter from ServerSidePlannedTable and related classes - Accept SparkSession as constructor parameter (following DeltaTableV2 pattern) - Use sessionState.newHadoopConf() with scalastyle suppression and comment explaining the limitation - Change tests from Delta to Parquet tables (no longer need DeltaLog) - Merge two tests into one comprehensive end-to-end test with fine-grained assertions - Fix URL decoding issue in TestServerSidePlanningClient for input_file_name() paths - Add detailed assertions for factory registry, file discovery, table metadata, and reader creation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Remove IcebergRESTCatalogPlanningClientFactory from interface PR Move IcebergRESTCatalogPlanningClientFactory to PR B where the actual Iceberg implementation lives. Update factory registry to require explicit registration instead of defaulting to Iceberg factory. This makes PR A self-contained with no dependencies on unimplemented classes. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Add explanatory comments for caselocale scalastyle suppressions --------- Co-authored-by: Claude <[email protected]>
tdas
reviewed
Dec 3, 2025
spark/src/main/scala/org/apache/spark/sql/delta/catalog/ServerSidePlannedTable.scala
Outdated
Show resolved
Hide resolved
tdas
reviewed
Dec 3, 2025
spark/src/main/scala/org/apache/spark/sql/delta/catalog/ServerSidePlannedTable.scala
Outdated
Show resolved
Hide resolved
tdas
reviewed
Dec 3, 2025
spark/src/main/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlannedTable.scala
Show resolved
Hide resolved
tdas
reviewed
Dec 3, 2025
spark/src/main/scala/org/apache/spark/sql/delta/catalog/ServerSidePlannedTable.scala
Outdated
Show resolved
Hide resolved
tdas
reviewed
Dec 3, 2025
spark/src/main/scala/org/apache/spark/sql/delta/catalog/ServerSidePlannedTable.scala
Outdated
Show resolved
Hide resolved
tdas
reviewed
Dec 3, 2025
.../src/main/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlanningClient.scala
Outdated
Show resolved
Hide resolved
tdas
reviewed
Dec 3, 2025
.../src/main/scala/org/apache/spark/sql/delta/serverSidePlanning/ServerSidePlanningClient.scala
Show resolved
Hide resolved
tdas
reviewed
Dec 3, 2025
.../test/scala/org/apache/spark/sql/delta/serverSidePlanning/TestServerSidePlanningClient.scala
Outdated
Show resolved
Hide resolved
tdas
reviewed
Dec 3, 2025
.../test/scala/org/apache/spark/sql/delta/serverSidePlanning/TestServerSidePlanningClient.scala
Outdated
Show resolved
Hide resolved
57d8ee1 to
a5acb16
Compare
- Update copyright year from 2021 to 2025 in all files - Rename 'database' parameter to 'databaseName' for consistency - Rename 'schema' parameter to 'tableSchema' in PartitionReaderFactory - Move ServerSidePlannedTable to serverSidePlanning package - Update Hadoop config comment to clarify intentional design - Make factory classes private[serverSidePlanning] - Simplify factory API: replace getFactory() and buildForCatalog() with single getClient() method - Remove unnecessary docstring line from TestServerSidePlanningClient - Simplify getFileFormat() to return only 'parquet' for test code - Update imports in ServerSidePlannedTableSuite
a5acb16 to
3167356
Compare
Update ServerSidePlannedTableSuite to match API changes: - Remove getFactory() assertion (method made package-private) - Replace buildForCatalog() with getClient() - Rename database parameter to databaseName Addresses CI compilation errors after addressing TD's review comments.
tdas
approved these changes
Dec 9, 2025
TimothyW553
pushed a commit
to TimothyW553/delta
that referenced
this pull request
Dec 10, 2025
…2 table implementation (delta-io#5621) ## Summary Adds core infrastructure for server-side table scan planning: - **ServerSidePlanningClient** interface for remote scan planning - **ServerSidePlannedTable** DSv2 table implementation - **TestServerSidePlanningClient** mock client using Spark for testing - Factory pattern for building planning clients This enables Delta Lake to delegate file discovery to external catalog services (e.g., Unity Catalog) for tables with fine-grained access control. ## Key Changes - Interface uses simple data classes (ScanFile, ScanPlan) with no external dependencies - ServerSidePlannedTable implements DSv2 Table/Scan/Batch interfaces - Tests use TestServerSidePlanningClient (Spark-based mock) - no external services needed - Read-only table (BATCH_READ capability only) ## Testing Spark module compiles and tests pass: \`\`\` spark/testOnly org.apache.spark.sql.delta.serverSidePlanning.ServerSidePlannedTableSuite \`\`\` Note: Upstream master currently has kernel-api compilation issues (pre-existing). Our spark-only changes compile successfully. ## Stack Position This is the first PR in a stack. Please review in order. --------- Co-authored-by: Claude <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Adds core infrastructure for server-side table scan planning:
This enables Delta Lake to delegate file discovery to external catalog services (e.g., Unity Catalog) for tables with fine-grained access control.
Key Changes
Testing
Spark module compiles and tests pass:
```
spark/testOnly org.apache.spark.sql.delta.serverSidePlanning.ServerSidePlannedTableSuite
```
Note: Upstream master currently has kernel-api compilation issues (pre-existing). Our spark-only changes compile successfully.
Stack Position
This is the first PR in a stack. Please review in order.