-
Notifications
You must be signed in to change notification settings - Fork 2k
Setup Utils for CCv2 tables #5477
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup Utils for CCv2 tables #5477
Conversation
|
|
||
| private CatalogTableUtils() {} | ||
|
|
||
| public static boolean isCCv2Table(CatalogTable table) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it catalogOwned/Managed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, this will also help align with the existing terminology!
|
|
||
| private static boolean isCatalogOwnedFeatureSupported( | ||
| Map<String, String> tableProperties, String featureKey) { | ||
| if (tableProperties == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just check tableProperties non null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
| * properties. | ||
| */ | ||
| public final class CatalogTableUtils { | ||
| static final String UNITY_CATALOG_PROPERTY_PREFIX = "delta.unityCatalog."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really put this properties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This specific property I thought existed but actually does not, changed it now to UC_TABLE_ID_KEY which is ucTableId right now - I believe it is changing to catalogManaged.unityCatalog.tableId
| // if the catalogManaged/catalogOwned-preview flags are 'supported' | ||
| public static boolean isCatalogManaged(CatalogTable table) { | ||
| requireNonNull(table, "table is null"); | ||
| Map<String, String> tableProperties = toJavaMap(table.properties()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should look at table.storage.properties() storage's class name is CatalogStorageFormat, I think delta(which is a stoarg)'s related properties should like here if upstream implements the semantic correctly, like https://github.com/unitycatalog/unitycatalog/blob/main/connectors/spark/src/main/scala/io/unitycatalog/spark/UCSingleCatalog.scala#L279
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I currently merge both outer properties() and storage().properties() in case. I looked into AbstractDeltaCatalog, it seems we will have to merge them to get all desired properties.
| return featureValue.equalsIgnoreCase(SUPPORTED); | ||
| } | ||
|
|
||
| private static Map<String, String> toJavaMap( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Put it in the ScalaUtils.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, moved to ScalaUtils
|
|
||
| private CatalogTableUtils() {} | ||
|
|
||
| // Checks whether *any* catalog manages this table via CCv2 semantics by checking |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please use
/**
*
- @param xxxx
*/
for method documentation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, updated!
| requireNonNull(table, "table is null"); | ||
| Map<String, String> merged = new HashMap<>(); | ||
| merged.putAll(ScalaUtils.toJavaMap(table.storage().properties())); | ||
| merged.putAll(ScalaUtils.toJavaMap(table.properties())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we just add table.storage().properties()?
| /** | ||
| * Creates a CatalogTable with the given properties. This is a helper method to create a | ||
| * CatalogTable for testing purposes - see interface {@link CatalogTable} for more details. | ||
| * | ||
| * @param properties the properties to set on the CatalogTable | ||
| * @return a CatalogTable with the given properties | ||
| */ | ||
| private static CatalogTable catalogTableWithProperties(Map<String, String> properties) { | ||
| return catalogTableWithProperties(properties, Collections.emptyMap()); | ||
| } | ||
|
|
||
| private static CatalogTable catalogTableWithProperties( | ||
| Map<String, String> properties, Map<String, String> storageProperties) { | ||
| return CatalogTableTestUtils$.MODULE$.catalogTableWithProperties(properties, storageProperties); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe not needed? I feel like these two utils methods are a bit verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah the first one with just properties() is not needed and verbose I agree. I'll keep the one with two parameters as it makes the testing code cleaner.
| class CatalogTableUtilsTest { | ||
|
|
||
| @Test | ||
| void catalogManagedFlagEnablesDetection() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we name the test case like
testIsCatalogManaged_CatalogManagedEnable_returnsTrue
testIntent_condition_expectation.
Also split
assertTrue(
CatalogTableUtils.isCatalogManaged(table), "Should detect catalog management with flag");
assertFalse(
CatalogTableUtils.isUnityCatalogManagedTable(table), "Should not detect Unity without ID");
to keep unit test focused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks for the test case naming review. I'll keep that in mind
9e3af8f to
4da2a45
Compare
| * @param table Spark {@link CatalogTable} descriptor | ||
| * @return Java map view of the storage properties | ||
| */ | ||
| public static Map<String, String> getStorageProperties(CatalogTable table) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be private method?
| return ScalaUtils.toJavaMap(table.storage().properties()); | ||
| } | ||
|
|
||
| public static boolean isCatalogManagedFeatureEnabled( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to private.
| import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType} | ||
| import org.apache.spark.sql.types.StructType | ||
|
|
||
| /** Helpers for constructing [[CatalogTable]] instances inside Java tests. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document why scala is needed. CatalogTable may accept different params on different spark version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah added docs on why scala is needed. To summarize, we need it because if we were to construct it in Java, then we need to fill even the optional parameters in. By constructing in scala we dont need to worry about new optional parameters.
kernel-spark/src/main/java/io/delta/kernel/spark/utils/CatalogTableUtils.java
Show resolved
Hide resolved
huan233usc
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
a523d7d to
57d813a
Compare
eefc8dc to
57d813a
Compare
| */ | ||
| public static boolean isCatalogManaged(CatalogTable table) { | ||
| requireNonNull(table, "table is null"); | ||
| Map<String, String> storageProperties = getStorageProperties(table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: is storageProperties always not null? Given the method returns a boolean, shall we simply return false if storageProperties is null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep! that's a good idea. I've updated it so that if the table.storage().properties() is empty we return an emptyMap (less code, instead of checking for null each time) the empty map will result in a false anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why adding a scala file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I added scala file was for the CatalogTable construction. Construction the CatalogTable in java required me to explicitly pass in all parameters including the optional ones. If we add more optional parameters in CatalogTable constructor then it will also require changing the test here.
4335df1 to
ae14d13
Compare
Signed-off-by: TimothyW553 <[email protected]>
Signed-off-by: TimothyW553 <[email protected]>
6605534 to
6359268
Compare
Use this [link](https://github.com/delta-io/delta/pull/5477/files) to review incremental changes. - [**catalogtableutils-ccv2**](delta-io#5477) [[Files changed](https://github.com/delta-io/delta/pull/5477/files)] - [stack/ccv2-catalog-config](delta-io#5520) [[Files changed](https://github.com/delta-io/delta/pull/5520/files/6359268dbee8d1a114e3f66620c6585bc0bdb6eb..4a1d8fa93e56d68b5971fb32970bdeaa5799abdc)] --------- <!-- Thanks for sending a pull request! Here are some tips for you: 1. If this is your first time, please read our contributor guidelines: https://github.com/delta-io/delta/blob/master/CONTRIBUTING.md 2. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP] Your PR title ...'. 3. Be sure to keep the PR description updated to reflect all changes. 4. Please write your PR title to summarize what this PR proposes. 5. If possible, provide a concise example to reproduce the issue for a faster review. 6. If applicable, include the corresponding issue number in the PR title and link it in the body. --> <!-- Please add the component selected below to the beginning of the pull request title For example: [Spark] Title of my pull request --> - [ ] Spark - [ ] Standalone - [ ] Flink - [ ] Kernel - [ ] Other (fill in here) This PR adds utils for CatalogTable, Scala, and for catalogtable testing. In particular, CatalogTableUtils is used for determining if a table is managed/owned by UC -- which will determine the source of truth for operations. <!-- - Describe what this PR changes. - Describe why we need the change. If this PR resolves an issue be sure to include "Resolves #XXX" to correctly link and close the issue upon merge. --> - tested locally via `build/sbt -DsparkVersion=master "++ 2.13.16" clean sparkV2/test` - passing CI tests <!-- If tests were added, say they were added here. Please make sure to test the changes thoroughly including negative and positive cases if possible. If the changes were tested in any way other than unit tests, please clarify how you tested step by step (ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future). If the changes were not tested, please explain why. --> No. <!-- If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible. If possible, please also clarify if this is a user-facing change compared to the released Delta Lake versions or within the unreleased branches such as master. If no, write 'No'. --> --------- Signed-off-by: TimothyW553 <[email protected]> Signed-off-by: Timothy Wang <[email protected]>
|
|
||
| /** | ||
| * Utility helpers for inspecting Delta-related metadata persisted on Spark {@link CatalogTable} | ||
| * instances by Unity Catalog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should assume or default to Unity - eh?
🥞 Stacked PR
Use this link to review incremental changes.
Which Delta project/connector is this regarding?
Description
This PR adds utils for CatalogTable, Scala, and for catalogtable testing. In particular, CatalogTableUtils is used for determining if a table is managed/owned by UC -- which will determine the source of truth for operations.
How was this patch tested?
build/sbt -DsparkVersion=master "++ 2.13.16" clean sparkV2/testDoes this PR introduce any user-facing changes?
No.