Skip to content

Conversation

@mkarrmann
Copy link
Contributor

@mkarrmann mkarrmann commented Dec 3, 2025

Differential Revision: D87825974

Problem

Currently, TableFinishInfo hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon TableFinishInfo containing complete information

Solution

  • Make the length limit configurable via a Session Property.
  • Make the default value of the session property configurable at the cluster-level.
  • When the length limit is set to a non-positive value, entirely bypass enforcing a size limit.

Reviewer Notes

I couldn't find a good config file to put the default Table Finish JSON length limit, so I created a new config file: TableFinishConfig.

This slightly increased the scope of this change, because I needed to pass this config in all tests which explicitly construct SystemSessionProperties.

== RELEASE NOTES ==

General Changes
* Add System Session Property `table_finish_info_json_length_limit`, making the `TableFinishInfo` length limit configurable. Disables length limit enforcement when set to a non-positive value. Defaults to `10MB` (current hard-coded limit).

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 3, 2025

Reviewer's Guide

Makes the maximum size of serialized connector output metadata in TableFinishInfo configurable via a cluster-level config (TableFinishConfig) and a session property, and plumbs this limit into TableFinishOperator with support for disabling truncation when the limit is zero.

Sequence diagram for configurable TableFinishInfo JSON length

sequenceDiagram
    actor Client
    participant Session
    participant TableFinishOperatorFactory
    participant TableFinishOperator
    participant TableFinishInfo

    Client->>Session: Execute write query
    Session->>TableFinishOperatorFactory: createOperator(driverContext)
    TableFinishOperatorFactory->>Session: getSystemProperty(TABLE_FINISH_INFO_JSON_LENGTH_LIMIT, DataSize)
    Session-->>TableFinishOperatorFactory: DataSize jsonLengthLimit
    TableFinishOperatorFactory->>TableFinishOperator: new TableFinishOperator(..., jsonLengthLimit)

    Note over TableFinishOperator: During operator construction
    TableFinishOperator->>TableFinishOperator: createTableFinishInfoSupplier(outputMetadata, statisticsTiming, jsonLengthLimit)

    Note over TableFinishOperator: When info is requested
    TableFinishOperator->>TableFinishInfo: new TableFinishInfo(outputMetadata, wallTime, cpuTime, jsonLengthLimit)
    TableFinishInfo->>TableFinishInfo: getSerializedMetadataWithLimit(metadata, jsonLengthLimit)
    alt jsonLengthLimit value is 0
        TableFinishInfo-->>TableFinishInfo: INFO_CODEC.toJson(metadataInfo)
    else jsonLengthLimit value > 0
        TableFinishInfo-->>TableFinishInfo: INFO_CODEC.toJsonWithLengthLimit(metadataInfo, jsonLengthLimit bytes)
    end
    TableFinishInfo-->>TableFinishOperator: TableFinishInfo instance
    TableFinishOperator-->>Client: Query completion info including TableFinishInfo
Loading

Flow diagram for TableFinishInfo JSON length configuration

flowchart LR
    ConfigFile["Cluster config file\n(table-finish-info-json-length-limit)"] --> TableFinishConfig
    TableFinishConfig --> SystemSessionProperties["SystemSessionProperties\nregisters session property\nTABLE_FINISH_INFO_JSON_LENGTH_LIMIT"]
    SystemSessionProperties --> Session["Session\n(system properties)"]
    Session --> TableFinishOperatorFactory["TableFinishOperatorFactory\ncreateOperator"]
    TableFinishOperatorFactory --> Session
    Session -->|"getSystemProperty(TABLE_FINISH_INFO_JSON_LENGTH_LIMIT, DataSize)"| TableFinishOperatorFactory
    TableFinishOperatorFactory --> TableFinishOperator["TableFinishOperator\n(jsonLengthLimit field)"]
    TableFinishOperator --> TableFinishInfo["TableFinishInfo\nconstruct with jsonLengthLimit"]
    TableFinishInfo -->|serialize connector output metadata\nwith optional truncation| OutputMetadata["serializedConnectorOutputMetadata"]
Loading

File-Level Changes

Change Details Files
Make TableFinishInfo serialization respect a configurable JSON length limit, including a zero-value "no limit" mode.
  • Removed the hard-coded 10MB JSON_LENGTH_LIMIT constant in favor of a DataSize argument passed to the constructor.
  • Added an overload of the TableFinishInfo constructor that defaults the JSON length limit to 0 bytes (disabling truncation).
  • Introduced a helper method that either serializes metadata without limit when the configured size is zero or uses JsonCodec.toJsonWithLengthLimit otherwise.
  • Simplified metadata serialization logic to use Optional.orElse(null) and set the jsonLengthLimitExceeded flag based on the Optional presence.
presto-main-base/src/main/java/com/facebook/presto/operator/TableFinishInfo.java
Introduce TableFinishConfig and a corresponding system session property to control the JSON length limit, with a cluster-level default.
  • Created TableFinishConfig with a configurable DataSize tableFinishInfoJsonLengthLimit, defaulting to 10MB, and annotated it as a config bean.
  • Bound TableFinishConfig in ServerMainModule so it is populated from configuration properties.
  • Registered a new TABLE_FINISH_INFO_JSON_LENGTH_LIMIT session property wired to TableFinishConfig.getTableFinishInfoJsonLengthLimit, using DataSize parsing/formatting.
  • Updated various test and utility wiring (SystemSessionProperties tests, LocalQueryRunner, execution tests) to include TableFinishConfig in SessionPropertyManager construction.
presto-main-base/src/main/java/com/facebook/presto/operator/TableFinishConfig.java
presto-main-base/src/test/java/com/facebook/presto/operator/TestTableFinishConfig.java
presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java
presto-main/src/main/java/com/facebook/presto/server/ServerMainModule.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java
presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestValidateStreamingJoins.java
presto-main-base/src/main/java/com/facebook/presto/testing/LocalQueryRunner.java
presto-main-base/src/test/java/com/facebook/presto/execution/TestCreateMaterializedViewTask.java
presto-main-base/src/test/java/com/facebook/presto/execution/scheduler/TestAdaptivePhasedExecutionPolicy.java
Propagate the configurable JSON length limit into TableFinishOperator so that TableFinishInfo instances are created with the configured limit.
  • Look up the TABLE_FINISH_INFO_JSON_LENGTH_LIMIT system property (as a DataSize) from the Session in TableFinishOperatorFactory.createOperator.
  • Extended TableFinishOperator constructor and state to accept and store the jsonLengthLimit parameter.
  • Updated the TableFinishInfo supplier factory to pass the jsonLengthLimit into the TableFinishInfo constructor so statistics JSON respects the configured limit.
presto-main-base/src/main/java/com/facebook/presto/operator/TableFinishOperator.java

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • The toIntExact(jsonLengthLimit.toBytes()) call in getSerializedMetadataWithLimit will overflow for values above ~2 GB (e.g., the 5 GB example in TestTableFinishConfig), so either cap the allowed size, validate in TableFinishConfig/session property, or switch to an API that accepts a long to avoid runtime failures.
  • Consider validating tableFinishInfoJsonLengthLimit in TableFinishConfig (e.g., non-negative and within the range that can be safely converted to an int in TableFinishInfo) to fail fast on misconfiguration rather than later during metadata serialization.
  • In getSerializedMetadataWithLimit, checking jsonLengthLimit.getValue() == 0 relies on a floating-point comparison; using a byte-based check (e.g., jsonLengthLimit.toBytes() == 0) would be more robust and consistent with how the value is subsequently used.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `toIntExact(jsonLengthLimit.toBytes())` call in `getSerializedMetadataWithLimit` will overflow for values above ~2 GB (e.g., the 5 GB example in `TestTableFinishConfig`), so either cap the allowed size, validate in `TableFinishConfig`/session property, or switch to an API that accepts a `long` to avoid runtime failures.
- Consider validating `tableFinishInfoJsonLengthLimit` in `TableFinishConfig` (e.g., non-negative and within the range that can be safely converted to an `int` in `TableFinishInfo`) to fail fast on misconfiguration rather than later during metadata serialization.
- In `getSerializedMetadataWithLimit`, checking `jsonLengthLimit.getValue() == 0` relies on a floating-point comparison; using a byte-based check (e.g., `jsonLengthLimit.toBytes() == 0`) would be more robust and consistent with how the value is subsequently used.

## Individual Comments

### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/operator/TableFinishInfo.java:119` </location>
<code_context>
+        if (jsonLengthLimit.getValue() == 0) {
+            return Optional.of(INFO_CODEC.toJson(metadataInfo));
+        }
+        return INFO_CODEC.toJsonWithLengthLimit(metadataInfo, toIntExact(jsonLengthLimit.toBytes()));
+    }
 }
</code_context>

<issue_to_address>
**issue:** Protect against overflow when converting the configured DataSize to an int length limit.

Now that `jsonLengthLimit` is user-configurable, a very large value could make `jsonLengthLimit.toBytes()` overflow `int` and cause `toIntExact` to throw `ArithmeticException`, turning a bad config into a runtime failure. Consider either clamping the value to `Integer.MAX_VALUE` here, or enforcing a maximum allowed size when parsing the config/session property so this code never receives an unrepresentable value.
</issue_to_address>

### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/operator/TableFinishConfig.java:35-37` </location>
<code_context>
+
+    @Config("table-finish-info-json-length-limit")
+    @ConfigDescription("Maximum size of connector output metadata JSON in table finish info")
+    public TableFinishConfig setTableFinishInfoJsonLengthLimit(DataSize tableFinishInfoJsonLengthLimit)
+    {
+        this.tableFinishInfoJsonLengthLimit = tableFinishInfoJsonLengthLimit;
+        return this;
+    }
</code_context>

<issue_to_address>
**issue:** Validate that the configured JSON length limit is non-negative (and possibly bounded).

This setter allows any `DataSize` value, including negatives. A negative value would propagate to `TableFinishInfo.getSerializedMetadataWithLimit`, where `toBytes()` becomes negative and is passed to `toJsonWithLengthLimit`, which likely expects a non-negative limit. It also conflicts with the "0 disables truncation" convention. Please validate this configuration (e.g., `size >= 0`, and optionally an upper bound) so misconfiguration fails fast at config time rather than at query execution.
</issue_to_address>

### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java:569` </location>
<code_context>
                         true),
+                new PropertyMetadata<>(
+                        TABLE_FINISH_INFO_JSON_LENGTH_LIMIT,
+                        "Maximum size of connector output metadata JSON in table finish info",
+                        VARCHAR,
+                        DataSize.class,
</code_context>

<issue_to_address>
**suggestion:** Clarify the "0 disables truncation" behavior in the session property/config descriptions.

The code treats a `DataSize` of 0 as disabling truncation (`getSerializedMetadataWithLimit` skips the limit when `getValue() == 0`), but the session property and `TableFinishConfig` descriptions don’t say this. Please update the description strings to explicitly note that `0` means “no limit” so users can intentionally disable truncation.

Suggested implementation:

```java
                new PropertyMetadata<>(
                        TABLE_FINISH_INFO_JSON_LENGTH_LIMIT,
                        "Maximum size of connector output metadata JSON in table finish info (0 disables truncation)",
                        VARCHAR,
                        DataSize.class,
                        tableFinishConfig.getTableFinishInfoJsonLengthLimit(),
                        false,
                        value -> DataSize.valueOf((String) value),
                        DataSize::toString),

```

To fully implement your review comment across the codebase, you should also:

1. Update the description of the corresponding config property in `TableFinishConfig` (likely the one backing `getTableFinishInfoJsonLengthLimit()`) to mention that `0` disables truncation / means “no limit”.
2. Ensure any documentation or configuration reference (e.g., `docs/*.md` or configuration property tables) that describes this property uses the updated wording for consistency.
</issue_to_address>

### Comment 4
<location> `presto-main-base/src/test/java/com/facebook/presto/operator/TestTableFinishConfig.java:28` </location>
<code_context>
+import static com.facebook.airlift.units.DataSize.Unit.GIGABYTE;
+import static com.facebook.airlift.units.DataSize.Unit.MEGABYTE;
+
+public class TestTableFinishConfig
+{
+    @Test
</code_context>

<issue_to_address>
**issue (testing):** Add tests that exercise the new session property / TableFinishInfo behavior, not just the config mapping

The new tests only cover `TableFinishConfig` mapping, but not the behavioral changes (configurable JSON length and `0` meaning "no limit"). Please add tests that:

- Build a `TableFinishInfo` with metadata whose JSON slightly exceeds the configured limit and assert `jsonLengthLimitExceeded` is `true` and `serializedConnectorOutputMetadata` is `null`.
- Use the same metadata with a larger limit and assert `jsonLengthLimitExceeded` is `false` and the JSON is present.
- Use `jsonLengthLimit = DataSize.succinctBytes(0)` and large metadata that would previously be truncated, and assert the JSON is fully present and `jsonLengthLimitExceeded` is `false`.
- (Ideally) an integration-style test of `TableFinishOperator` + `SystemSessionProperties` that sets `TABLE_FINISH_INFO_JSON_LENGTH_LIMIT` and verifies the info from `operatorContext.getInfoSupplier()` respects the session property.

That way the configurable limit and `0` = "no limit" semantics are exercised end-to-end, not just the config wiring.
</issue_to_address>

### Comment 5
<location> `presto-main-base/src/test/java/com/facebook/presto/operator/TestTableFinishConfig.java:38-41` </location>
<code_context>
+    }
+
+    @Test
+    public void testExplicitPropertyMappings()
+    {
+        Map<String, String> properties = new ImmutableMap.Builder<String, String>()
+                .put("table-finish-info-json-length-limit", "5GB")
+                .build();
+
</code_context>

<issue_to_address>
**suggestion (testing):** Add an explicit mapping test for the `0` value to cover the "disable limit" behavior

Since `DataSize` value `0` has special meaning in `TableFinishInfo` (disables truncation), please add an assertion that this value is configurable via properties. For example, include a mapping like:

```java
Map<String, String> properties = ImmutableMap.of(
        "table-finish-info-json-length-limit", "0B");

TableFinishConfig expected = new TableFinishConfig()
        .setTableFinishInfoJsonLengthLimit(DataSize.succinctBytes(0));

assertFullMapping(properties, expected);
```

This will verify the config layer correctly accepts `0` and protect against regressions in the "no limit" behavior.
</issue_to_address>

### Comment 6
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java:248-249` </location>
<code_context>
                 new TracingConfig(),
                 new CompilerConfig(),
-                new HistoryBasedOptimizationConfig());
+                new HistoryBasedOptimizationConfig(),
+                new TableFinishConfig());
     }
</code_context>

<issue_to_address>
**suggestion (testing):** Consider a focused test that validates the default session-level value of `TABLE_FINISH_INFO_JSON_LENGTH_LIMIT`

Right now we only test `TableFinishConfig` itself, not that `SystemSessionProperties` uses its default correctly.

Please add a small test (ideally in a `SystemSessionProperties` test class) that:
- Constructs `SystemSessionProperties` with a `TableFinishConfig` whose `tableFinishInfoJsonLengthLimit` is set to a non-default value.
- Fetches the `PropertyMetadata` for `TABLE_FINISH_INFO_JSON_LENGTH_LIMIT` and asserts that its default matches the configured `DataSize`.

This will verify the wiring between the config and the session property and prevent regressions if they drift apart.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.


import static com.facebook.airlift.units.DataSize.Unit.MEGABYTE;

public class TableFinishConfig
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a better config file to put this config in, so I made my own. I'm open to suggestions on where else to put this!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will say, either FeaturesConfig, or make the name of this config general to incorporate similar needs in other operators?

mkarrmann added a commit to mkarrmann/presto that referenced this pull request Dec 3, 2025
Summary:

## Problem

Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information

## Solution

- Make the length limit configurable via a Session Property.
- Make the default value of the session property configurable at the cluster-level.
- When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO)

Differential Revision: D87825974
mkarrmann added a commit to mkarrmann/presto that referenced this pull request Dec 3, 2025
Summary:

## Problem

Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information

## Solution

- Make the length limit configurable via a Session Property.
- Make the default value of the session property configurable at the cluster-level.
- When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO)

Differential Revision: D87825974
@steveburnett
Copy link
Contributor

Please add documentation for the new system Session Property table_finish_info_json_length_limit in https://github.com/prestodb/presto/blob/master/presto-docs/src/main/sphinx/admin/properties-session.rst.

mkarrmann added a commit to mkarrmann/presto that referenced this pull request Dec 3, 2025
Summary:

## Problem

Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information

## Solution

- Make the length limit configurable via a Session Property.
- Make the default value of the session property configurable at the cluster-level.
- When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO)

Differential Revision: D87825974
=========================

This section describes session properties that may be used to tune
This section describes session properties that may be used to tune
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These whitespace changes are being made automatically by my editor.

"might as well" clean up these white space issues. I created a separate PR for that: 26736

Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the doc! Just a couple of nits.


Materialized views are experimental. The SPI and behavior may change in future releases.

``table_finish_info_json_length_limi``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``table_finish_info_json_length_limi``
``table_finish_info_json_length_limi``

_limi or _limit?

Materialized views are experimental. The SPI and behavior may change in future releases.

``table_finish_info_json_length_limi``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please edit this formatting line to the exact number of characters in the line being formatted above it, to avoid the following showing up in doc build output:

/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/admin/properties-session.rst:586: WARNING: Title underline too short.

``table_finish_info_json_length_limi``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [docutils]
/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/admin/properties-session.rst:586: WARNING: Title underline too short.

``table_finish_info_json_length_limi``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [docutils]

mkarrmann added a commit to mkarrmann/presto that referenced this pull request Dec 3, 2025
Summary:

## Problem

Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information

## Solution

- Make the length limit configurable via a Session Property.
- Make the default value of the session property configurable at the cluster-level.
- When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO)

Differential Revision: D87825974
@steveburnett
Copy link
Contributor

Nit: the formatting of the Release Notes section of the Description appears to not pass the CI check https://github.com/prestodb/presto/actions/runs/19903784846/job/57054525017?pr=26731

Please edit the Release Notes section.

steveburnett
steveburnett previously approved these changes Dec 3, 2025
Copy link
Contributor

@steveburnett steveburnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (docs)

Pull updated branch, new local doc build. Thanks!

mkarrmann added a commit to mkarrmann/presto that referenced this pull request Dec 3, 2025
Summary:

## Problem

Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information

## Solution

- Make the length limit configurable via a Session Property.
- Make the default value of the session property configurable at the cluster-level.
- When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO)

Differential Revision: D87825974
mkarrmann added a commit to mkarrmann/presto that referenced this pull request Dec 8, 2025
Summary:

## Problem

Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information

## Solution

- Make the length limit configurable via a Session Property.
- Make the default value of the session property configurable at the cluster-level.
- When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO)

Differential Revision: D87825974
mkarrmann added a commit to mkarrmann/presto that referenced this pull request Dec 8, 2025
Summary:

## Problem

Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information

## Solution

- Make the length limit configurable via a Session Property.
- Make the default value of the session property configurable at the cluster-level.
- When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO)

Differential Revision: D87825974
mkarrmann added a commit to mkarrmann/presto that referenced this pull request Dec 9, 2025
Summary:

## Problem

Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information

## Solution

- Make the length limit configurable via a Session Property.
- Make the default value of the session property configurable at the cluster-level.
- When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO)

Differential Revision: D87825974
Summary:

## Problem

Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information

## Solution

- Make the length limit configurable via a Session Property.
- Make the default value of the session property configurable at the cluster-level.
- When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO)

Differential Revision: D87825974
"Optimize scale writer creation based on producer buffer",
featuresConfig.isOptimizedScaleWriterProducerBuffer(),
true),
integerProperty(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider using dataSizeProperty, like QUERY_MAX_WRITTEN_INTERMEDIATE_BYTES ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I debated doing this! I think integer is actually better:

  1. Strictly speaking, we pass this down NFO_CODEC.toJsonWithLengthLimit(metadataInfo, jsonLengthLimit);. toJsonWithLengthLimit checks the number of characters, not the number of bytes. That is, integer is technically "more correct"

  2. I want to support disabling the feature when passing a non-positive value. DataSize requires you specify units (e.g. 0MB, 0GB), which is awkward if you just want to disable the feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants