-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Output metrics about remove_orphan_files execution #26661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1323,6 +1323,9 @@ protected Scope visitTableExecute(TableExecute node, Optional<Scope> scope) | |
analysis.setUpdateType("ALTER TABLE EXECUTE"); | ||
analysis.setUpdateTarget(executeHandle.catalogHandle().getVersion(), tableName, Optional.of(table), Optional.empty()); | ||
|
||
if (!procedureMetadata.getExecutionMode().isReadsData()) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we need this special case? |
||
return createAndAssignScope(node, scope, Field.newUnqualified("metric_name", VARCHAR), Field.newUnqualified("metric_value", BIGINT)); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will display I think it would be more natural to have each metric as a separate titled column, but IMO current solution is OK for now |
||
} | ||
return createAndAssignScope(node, scope, Field.newUnqualified("rows", BIGINT)); | ||
} | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -160,9 +160,10 @@ default void finishTableExecute(ConnectorSession session, ConnectorTableExecuteH | |
} | ||
|
||
/** | ||
* Execute a {@link TableProcedureExecutionMode#coordinatorOnly() coordinator-only} table procedure. | ||
* Execute a {@link TableProcedureExecutionMode#coordinatorOnly() coordinator-only} table procedure | ||
* and return procedure execution metrics that will be populated in the query output. | ||
*/ | ||
default void executeTableExecute(ConnectorSession session, ConnectorTableExecuteHandle tableExecuteHandle) | ||
default Map<String, Long> executeTableExecute(ConnectorSession session, ConnectorTableExecuteHandle tableExecuteHandle) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we planning to limit ourselves to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Or should we return them in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Its limited to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In the future, this should probably be |
||
{ | ||
throw new TrinoException(GENERIC_INTERNAL_ERROR, "ConnectorMetadata executeTableExecute() is not implemented"); | ||
} | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -907,12 +907,39 @@ time is recommended to keep size of a table's data directory under control. | |
ALTER TABLE test_table EXECUTE remove_orphan_files(retention_threshold => '7d'); | ||
``` | ||
|
||
```text | ||
metric_name | metric_value | ||
----------------------------+-------------- | ||
processed_manifests_count | 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I noticed the output in Spark looks different from this: https://iceberg.apache.org/docs/latest/spark-procedures/#output_7. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the spark output is printing deleted file names. I don't want to do that since that is potentially a large list. The main use case here is that we want to provide a summary of what the procedure did. Right now you get no clue what/if the procedure did anything, unless you run metadata queries before and after and search for differences to the table. |
||
active_files_count | 98 | ||
scanned_files_count | 97 | ||
deleted_files_count | 0 | ||
``` | ||
|
||
The value for `retention_threshold` must be higher than or equal to | ||
`iceberg.remove-orphan-files.min-retention` in the catalog otherwise the | ||
procedure fails with a similar message: `Retention specified (1.00d) is shorter | ||
than the minimum retention configured in the system (7.00d)`. The default value | ||
for this property is `7d`. | ||
|
||
The output of the query has the following metrics: | ||
|
||
:::{list-table} Output | ||
:widths: 40, 60 | ||
:header-rows: 1 | ||
|
||
* - Property name | ||
- Description | ||
* - `processed_manifests_count` | ||
- The count of manifest files read by remove_orphan_files. | ||
* - `active_files_count` | ||
- The count of files belonging to snapshots that have not been expired. | ||
* - `scanned_files_count` | ||
- The count of files scanned from the file system. | ||
* - `deleted_files_count` | ||
- The count of files deleted by remove_orphan_files. | ||
::: | ||
|
||
(drop-extended-stats)= | ||
##### drop_extended_stats | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Map<String, Long> instead of something more generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like some Connector specific handle that has a method to get a Page and list of symbols ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is running directly inside SimpleTableExecuteOperator, so we don't need anything complex.
There isn't a need for anything more generic here at the moment, the use case of all the iceberg procedures where we want to add this is satisfied by this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we could expose additional metrics in double or other type ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but it could run for other cases as well. I can image that other table procedures will output some kind of an information
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had one BIGINT field all these years and failed to even populate that properly.
So no, I don't think there are many use cases waiting for a more generic framework.
I've gone through all the iceberg procedures and everything worth showing in the output is an integer. See Outputs in https://iceberg.apache.org/docs/latest/spark-procedures/
There is nothing stopping anyone from using something more generic like
io.trino.spi.metrics.Metric
in the future. At the moment, I find it to be unnecessary.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could imagine something like
OPTIMIZE
could produce something like input file size histogram.We could just use
io.trino.spi.metrics.Metrics
here.However, I think
Map<String, Long>
is also OK for now