Output metrics about remove_orphan_files execution #26661

raunaqmorarka · 2025-09-18T12:43:12Z

Description

alter table lineitem execute remove_orphan_files(retention_threshold => '0d');

            metric_name         | metric_value
    ----------------------------+--------------
     processed_manifests_count  |            2
     valid_files_count          |           98
     scanned_files_count        |           97
     deleted_files_count        |            0

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

## Iceberg
* Produce metrics about execution in the output of remove_orphan_files procedure. ({issue}`26661`)

Copilot

Pull Request Overview

This PR enhances the Iceberg connector's remove_orphan_files procedure to output execution metrics. Instead of returning no information, the procedure now returns a table with metrics including manifest file processing count, valid files found, files scanned, and files deleted.

Modified the executeTableExecute method signature across the SPI to return Map<String, Long> instead of void
Enhanced the remove_orphan_files implementation to collect and return execution metrics
Updated query result handling to display metrics as a two-column table with metric names and values

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java	Changed executeTableExecute return type from void to Map<String, Long>
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java	Implemented metrics collection and return for remove_orphan_files procedure
core/trino-main/src/main/java/io/trino/operator/SimpleTableExecuteOperator.java	Updated operator to build result pages from returned metrics
core/trino-main/src/main/java/io/trino/sql/planner/plan/SimpleTableExecuteNode.java	Changed from single output symbol to list of symbols for metric name/value columns
docs/src/main/sphinx/connector/iceberg.md	Added documentation for the new metrics output format

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java

wendigo · 2025-09-18T13:05:48Z

core/trino-main/src/main/java/io/trino/metadata/Metadata.java

    void finishTableExecute(Session session, TableExecuteHandle handle, Collection<Slice> fragments, List<Object> tableExecuteState);

-    void executeTableExecute(Session session, TableExecuteHandle handle);
+    Map<String, Long> executeTableExecute(Session session, TableExecuteHandle handle);


Why Map<String, Long> instead of something more generic?

Like some Connector specific handle that has a method to get a Page and list of symbols ?

This method is running directly inside SimpleTableExecuteOperator, so we don't need anything complex.
There isn't a need for anything more generic here at the moment, the use case of all the iceberg procedures where we want to add this is satisfied by this.

What if we could expose additional metrics in double or other type ?

Sure, but it could run for other cases as well. I can image that other table procedures will output some kind of an information

We've had one BIGINT field all these years and failed to even populate that properly.
So no, I don't think there are many use cases waiting for a more generic framework.
I've gone through all the iceberg procedures and everything worth showing in the output is an integer. See Outputs in https://iceberg.apache.org/docs/latest/spark-procedures/
There is nothing stopping anyone from using something more generic like io.trino.spi.metrics.Metric in the future. At the moment, I find it to be unnecessary.

Praveen2112 · 2025-09-18T13:07:34Z

core/trino-spi/src/main/java/io/trino/spi/connector/ConnectorMetadata.java

+     * and return procedure execution metrics that will be populated in the query output.
     */
-    default void executeTableExecute(ConnectorSession session, ConnectorTableExecuteHandle tableExecuteHandle)
+    default Map<String, Long> executeTableExecute(ConnectorSession session, ConnectorTableExecuteHandle tableExecuteHandle)


Are we planning to limit ourselves to Long or - can be metrics be additional type like double ?

Or should we return them in Row type ?

Its limited to long for now, as that's what the current procedures need

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java

Praveen2112 · 2025-09-18T13:13:36Z

testing/trino-testing/src/main/java/io/trino/testing/QueryAssertions.java

        List<MaterializedRow> expectedRows = expectedResults.getMaterializedRows();

-        if (compareUpdate) {
+        if (compareUpdate && !actualResults.getUpdateType().equals(Optional.of("ALTER TABLE EXECUTE"))) {


Instead of skipping here, can we introduce something assertAlter or a dedicated method - which would set compareUpdate to false ?

There should be no need for a special assertAlter method. We should be able to assert a more detailed output with assertUpdate itself.
This branch is written with the assumption that an update query can only produce a row count output and nothing else. Since ALTER commands did not output anything before, this branch was never executed before.
Now with the more detailed output, we need to skip over this branch to allow the assertion on detailed results.

chenjian2664 · 2025-09-20T10:59:17Z

docs/src/main/sphinx/connector/iceberg.md

+  - Description
+* - `processed_manifests_count`
+  - The count of manifest files read by remove_orphan_files.
+* - `valid_files_count`


I think the meaning of valid here may be unclear to users. It doesn’t seem very meaningful to expose it directly, could we add more details to clarify what it represents/contains

I've renamed it to active_files_count and clarified that these are files belonging to snapshots that have not been expired yet.

chenjian2664 · 2025-09-20T11:09:48Z

docs/src/main/sphinx/connector/iceberg.md

+```text
+        metric_name         | metric_value
+----------------------------+--------------
+ processed_manifests_count  |            2


I noticed the output in Spark looks different from this: https://iceberg.apache.org/docs/latest/spark-procedures/#output_7.

Yes, the spark output is printing deleted file names. I don't want to do that since that is potentially a large list. The main use case here is that we want to provide a summary of what the procedure did. Right now you get no clue what/if the procedure did anything, unless you run metadata queries before and after and search for differences to the table.

cla-bot bot added the cla-signed label Sep 18, 2025

github-actions bot added docs iceberg Iceberg connector labels Sep 18, 2025

raunaqmorarka requested review from Copilot, ebyhr, losipiuk, chenjian2664 and hashhar September 18, 2025 12:43

Copilot AI reviewed Sep 18, 2025

View reviewed changes

plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/IcebergMetadata.java Outdated Show resolved Hide resolved

plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java Outdated Show resolved Hide resolved

raunaqmorarka requested a review from findinpath September 18, 2025 12:47

wendigo reviewed Sep 18, 2025

View reviewed changes

raunaqmorarka force-pushed the remove-cli branch from 3577a9d to b63db28 Compare September 18, 2025 13:17

Praveen2112 reviewed Sep 18, 2025

View reviewed changes

ebyhr requested a review from martint September 18, 2025 13:19

chenjian2664 reviewed Sep 20, 2025

View reviewed changes

raunaqmorarka force-pushed the remove-cli branch from b63db28 to c75a156 Compare September 26, 2025 11:38

raunaqmorarka added 2 commits September 26, 2025 17:11

Allow coordinator only procedures to output metrics in result

3e6f60e

raunaqmorarka force-pushed the remove-cli branch from c75a156 to 0a72e6b Compare September 26, 2025 11:41

raunaqmorarka requested review from chenjian2664 and Praveen2112 September 26, 2025 11:42

Output metrics about remove_orphan_files execution #26661

Are you sure you want to change the base?

Output metrics about remove_orphan_files execution #26661

Uh oh!

Conversation

raunaqmorarka commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Additional context and related issues

Release notes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raunaqmorarka Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

raunaqmorarka commented Sep 18, 2025 •

edited

Loading

raunaqmorarka Sep 26, 2025 •

edited

Loading