Skip to content

Conversation

@TimothyW553
Copy link
Collaborator

@TimothyW553 TimothyW553 commented Nov 10, 2025

🥞 Stacked PR

Use this link to review incremental changes.


Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

This PR adds utils for CatalogTable, Scala, and for catalogtable testing. In particular, CatalogTableUtils is used for determining if a table is managed/owned by UC -- which will determine the source of truth for operations.

How was this patch tested?

  • tested locally via build/sbt -DsparkVersion=master "++ 2.13.16" clean sparkV2/test
  • passing CI tests

Does this PR introduce any user-facing changes?

No.


private CatalogTableUtils() {}

public static boolean isCCv2Table(CatalogTable table) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call it catalogOwned/Managed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, this will also help align with the existing terminology!


private static boolean isCatalogOwnedFeatureSupported(
Map<String, String> tableProperties, String featureKey) {
if (tableProperties == null) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just check tableProperties non null?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

* properties.
*/
public final class CatalogTableUtils {
static final String UNITY_CATALOG_PROPERTY_PREFIX = "delta.unityCatalog.";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really put this properties?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This specific property I thought existed but actually does not, changed it now to UC_TABLE_ID_KEY which is ucTableId right now - I believe it is changing to catalogManaged.unityCatalog.tableId

@TimothyW553 TimothyW553 marked this pull request as ready for review November 14, 2025 00:44
// if the catalogManaged/catalogOwned-preview flags are 'supported'
public static boolean isCatalogManaged(CatalogTable table) {
requireNonNull(table, "table is null");
Map<String, String> tableProperties = toJavaMap(table.properties());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should look at table.storage.properties() storage's class name is CatalogStorageFormat, I think delta(which is a stoarg)'s related properties should like here if upstream implements the semantic correctly, like https://github.com/unitycatalog/unitycatalog/blob/main/connectors/spark/src/main/scala/io/unitycatalog/spark/UCSingleCatalog.scala#L279

Copy link
Collaborator Author

@TimothyW553 TimothyW553 Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I currently merge both outer properties() and storage().properties() in case. I looked into AbstractDeltaCatalog, it seems we will have to merge them to get all desired properties.

return featureValue.equalsIgnoreCase(SUPPORTED);
}

private static Map<String, String> toJavaMap(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put it in the ScalaUtils.java

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, moved to ScalaUtils


private CatalogTableUtils() {}

// Checks whether *any* catalog manages this table via CCv2 semantics by checking
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please use
/**
*

  • @param xxxx
    */
    for method documentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, updated!

requireNonNull(table, "table is null");
Map<String, String> merged = new HashMap<>();
merged.putAll(ScalaUtils.toJavaMap(table.storage().properties()));
merged.putAll(ScalaUtils.toJavaMap(table.properties()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just add table.storage().properties()?

Comment on lines 107 to 121
/**
* Creates a CatalogTable with the given properties. This is a helper method to create a
* CatalogTable for testing purposes - see interface {@link CatalogTable} for more details.
*
* @param properties the properties to set on the CatalogTable
* @return a CatalogTable with the given properties
*/
private static CatalogTable catalogTableWithProperties(Map<String, String> properties) {
return catalogTableWithProperties(properties, Collections.emptyMap());
}

private static CatalogTable catalogTableWithProperties(
Map<String, String> properties, Map<String, String> storageProperties) {
return CatalogTableTestUtils$.MODULE$.catalogTableWithProperties(properties, storageProperties);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe not needed? I feel like these two utils methods are a bit verbose

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the first one with just properties() is not needed and verbose I agree. I'll keep the one with two parameters as it makes the testing code cleaner.

class CatalogTableUtilsTest {

@Test
void catalogManagedFlagEnablesDetection() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we name the test case like
testIsCatalogManaged_CatalogManagedEnable_returnsTrue

testIntent_condition_expectation.

Also split

assertTrue(
CatalogTableUtils.isCatalogManaged(table), "Should detect catalog management with flag");
assertFalse(
CatalogTableUtils.isUnityCatalogManagedTable(table), "Should not detect Unity without ID");
to keep unit test focused.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for the test case naming review. I'll keep that in mind

* @param table Spark {@link CatalogTable} descriptor
* @return Java map view of the storage properties
*/
public static Map<String, String> getStorageProperties(CatalogTable table) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be private method?

return ScalaUtils.toJavaMap(table.storage().properties());
}

public static boolean isCatalogManagedFeatureEnabled(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to private.

import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType}
import org.apache.spark.sql.types.StructType

/** Helpers for constructing [[CatalogTable]] instances inside Java tests. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Document why scala is needed. CatalogTable may accept different params on different spark version?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah added docs on why scala is needed. To summarize, we need it because if we were to construct it in Java, then we need to fill even the optional parameters in. By constructing in scala we dont need to worry about new optional parameters.

Copy link
Collaborator

@huan233usc huan233usc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

*/
public static boolean isCatalogManaged(CatalogTable table) {
requireNonNull(table, "table is null");
Map<String, String> storageProperties = getStorageProperties(table);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QQ: is storageProperties always not null? Given the method returns a boolean, shall we simply return false if storageProperties is null?

Copy link
Collaborator Author

@TimothyW553 TimothyW553 Nov 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! that's a good idea. I've updated it so that if the table.storage().properties() is empty we return an emptyMap (less code, instead of checking for null each time) the empty map will result in a false anyway.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why adding a scala file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added scala file was for the CatalogTable construction. Construction the CatalogTable in java required me to explicitly pass in all parameters including the optional ones. If we add more optional parameters in CatalogTable constructor then it will also require changing the test here.

@TimothyW553 TimothyW553 force-pushed the catalogtableutils-ccv2 branch 2 times, most recently from 6605534 to 6359268 Compare November 24, 2025 23:25
@zachschuermann zachschuermann merged commit dc92aa2 into delta-io:master Nov 25, 2025
40 checks passed
zikangh pushed a commit to zikangh/delta that referenced this pull request Nov 26, 2025
Use this [link](https://github.com/delta-io/delta/pull/5477/files) to
review incremental changes.
-
[**catalogtableutils-ccv2**](delta-io#5477)
[[Files changed](https://github.com/delta-io/delta/pull/5477/files)]
-
[stack/ccv2-catalog-config](delta-io#5520)
[[Files
changed](https://github.com/delta-io/delta/pull/5520/files/6359268dbee8d1a114e3f66620c6585bc0bdb6eb..4a1d8fa93e56d68b5971fb32970bdeaa5799abdc)]

---------
<!--
Thanks for sending a pull request!  Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md
2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP]
Your PR title ...'.
  3. Be sure to keep the PR description updated to reflect all changes.
  4. Please write your PR title to summarize what this PR proposes.
5. If possible, provide a concise example to reproduce the issue for a
faster review.
6. If applicable, include the corresponding issue number in the PR title
and link it in the body.
-->

<!--
Please add the component selected below to the beginning of the pull
request title
For example: [Spark] Title of my pull request
-->

- [ ] Spark
- [ ] Standalone
- [ ] Flink
- [ ] Kernel
- [ ] Other (fill in here)

This PR adds utils for CatalogTable, Scala, and for catalogtable
testing. In particular, CatalogTableUtils is used for determining if a
table is managed/owned by UC -- which will determine the source of truth
for operations.

<!--
- Describe what this PR changes.
- Describe why we need the change.

If this PR resolves an issue be sure to include "Resolves #XXX" to
correctly link and close the issue upon merge.
-->

- tested locally via `build/sbt -DsparkVersion=master "++ 2.13.16" clean
sparkV2/test`
- passing CI tests

<!--
If tests were added, say they were added here. Please make sure to test
the changes thoroughly including negative and positive cases if
possible.
If the changes were tested in any way other than unit tests, please
clarify how you tested step by step (ideally copy and paste-able, so
that other reviewers can test and check, and descendants can verify in
the future).
If the changes were not tested, please explain why.
-->

No.
<!--
If yes, please clarify the previous behavior and the change this PR
proposes - provide the console output, description and/or an example to
show the behavior difference if possible.
If possible, please also clarify if this is a user-facing change
compared to the released Delta Lake versions or within the unreleased
branches such as master.
If no, write 'No'.
-->

---------

Signed-off-by: TimothyW553 <[email protected]>
Signed-off-by: Timothy Wang <[email protected]>

/**
* Utility helpers for inspecting Delta-related metadata persisted on Spark {@link CatalogTable}
* instances by Unity Catalog.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should assume or default to Unity - eh?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants