-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Ignore non-existing table when dropping Iceberg schema with cascade #27361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| for (SchemaTableName tableName : listTables(session, Optional.of(schemaName))) { | ||
| dropTable(session, getTableHandle(session, tableName, Optional.empty(), Optional.empty())); | ||
| ConnectorTableHandle tableHandle = getTableHandle(session, tableName, Optional.empty(), Optional.empty()); | ||
| if (tableHandle != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth code comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why view not have such issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reposting to main PR thread. I think views don't have that issue because it's a quick update to remove them from the metastore. Files connected to an iceberg table will take longer to delete and that increases risk of the table disappearing underneath.
| for (SchemaTableName tableName : listTables(session, Optional.of(schemaName))) { | ||
| dropTable(session, getTableHandle(session, tableName, Optional.empty(), Optional.empty())); | ||
| ConnectorTableHandle tableHandle = getTableHandle(session, tableName, Optional.empty(), Optional.empty()); | ||
| if (tableHandle != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
| dropTable(session, getTableHandle(session, tableName, Optional.empty(), Optional.empty())); | ||
| ConnectorTableHandle tableHandle = getTableHandle(session, tableName, Optional.empty(), Optional.empty()); | ||
| if (tableHandle != null) { | ||
| dropTable(session, tableHandle); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to understand the motivation of the change correctly ... Looks like dropTable is potentially blocking (for a while) the thread?
following method calls in github
- here in the dropTable implementation of TrinoHiveCatalog
- to here in iceberg's CatalogUtils.dropTableData
- to here in one of iceberg's `deleteFiles` implementations looks like it's reading manifest files to delete the files they reference recursively
- to iceberg `Tasks` here to see how those tasks get run
- to this implementation of `waitFor` in iceberg that shows a
while (true)loop running, and sleeping 10 milliseconds to wait for every task to get done. That's blocking.
Therefore tables we've listed from metadata can get dropped from beneath this thread as a DROP SCHEMA schema_name CASCADE is completing ...
I'm guessing the risk of dropping a table taking so long is mostly when there's externally managed files. The delta lake connector should be comparable. In your implementation of CASCADE there in the delta lake connector you use an additional try on dropTable even after we've checked for tableHandle existence. Is that worth bringing back to this PR as well for extra safety?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I didn't read this message carefully before merging the PR). Each Iceberg catalog maintains its own table metadata cache, so the first TrinoCatalog.loadTable call from IcebergMetadata.getTableHandle populates the cache, and the second TrinoCatalog.loadTable in TrinoCatalog.dropTable will retrieve the object from the cache. We can still add the catch block, though.
ca9055e to
bfdf083
Compare
Description
The
getTableHandlemethod returns null if the table is dropped concurrently during the loop.Release notes