-
Notifications
You must be signed in to change notification settings - Fork 5.5k
feat: Make table info JSON length configurable #26731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Reviewer's GuideMakes the maximum size of serialized connector output metadata in TableFinishInfo configurable via a cluster-level config (TableFinishConfig) and a session property, and plumbs this limit into TableFinishOperator with support for disabling truncation when the limit is zero. Sequence diagram for configurable TableFinishInfo JSON lengthsequenceDiagram
actor Client
participant Session
participant TableFinishOperatorFactory
participant TableFinishOperator
participant TableFinishInfo
Client->>Session: Execute write query
Session->>TableFinishOperatorFactory: createOperator(driverContext)
TableFinishOperatorFactory->>Session: getSystemProperty(TABLE_FINISH_INFO_JSON_LENGTH_LIMIT, DataSize)
Session-->>TableFinishOperatorFactory: DataSize jsonLengthLimit
TableFinishOperatorFactory->>TableFinishOperator: new TableFinishOperator(..., jsonLengthLimit)
Note over TableFinishOperator: During operator construction
TableFinishOperator->>TableFinishOperator: createTableFinishInfoSupplier(outputMetadata, statisticsTiming, jsonLengthLimit)
Note over TableFinishOperator: When info is requested
TableFinishOperator->>TableFinishInfo: new TableFinishInfo(outputMetadata, wallTime, cpuTime, jsonLengthLimit)
TableFinishInfo->>TableFinishInfo: getSerializedMetadataWithLimit(metadata, jsonLengthLimit)
alt jsonLengthLimit value is 0
TableFinishInfo-->>TableFinishInfo: INFO_CODEC.toJson(metadataInfo)
else jsonLengthLimit value > 0
TableFinishInfo-->>TableFinishInfo: INFO_CODEC.toJsonWithLengthLimit(metadataInfo, jsonLengthLimit bytes)
end
TableFinishInfo-->>TableFinishOperator: TableFinishInfo instance
TableFinishOperator-->>Client: Query completion info including TableFinishInfo
Flow diagram for TableFinishInfo JSON length configurationflowchart LR
ConfigFile["Cluster config file\n(table-finish-info-json-length-limit)"] --> TableFinishConfig
TableFinishConfig --> SystemSessionProperties["SystemSessionProperties\nregisters session property\nTABLE_FINISH_INFO_JSON_LENGTH_LIMIT"]
SystemSessionProperties --> Session["Session\n(system properties)"]
Session --> TableFinishOperatorFactory["TableFinishOperatorFactory\ncreateOperator"]
TableFinishOperatorFactory --> Session
Session -->|"getSystemProperty(TABLE_FINISH_INFO_JSON_LENGTH_LIMIT, DataSize)"| TableFinishOperatorFactory
TableFinishOperatorFactory --> TableFinishOperator["TableFinishOperator\n(jsonLengthLimit field)"]
TableFinishOperator --> TableFinishInfo["TableFinishInfo\nconstruct with jsonLengthLimit"]
TableFinishInfo -->|serialize connector output metadata\nwith optional truncation| OutputMetadata["serializedConnectorOutputMetadata"]
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- The
toIntExact(jsonLengthLimit.toBytes())call ingetSerializedMetadataWithLimitwill overflow for values above ~2 GB (e.g., the 5 GB example inTestTableFinishConfig), so either cap the allowed size, validate inTableFinishConfig/session property, or switch to an API that accepts alongto avoid runtime failures. - Consider validating
tableFinishInfoJsonLengthLimitinTableFinishConfig(e.g., non-negative and within the range that can be safely converted to anintinTableFinishInfo) to fail fast on misconfiguration rather than later during metadata serialization. - In
getSerializedMetadataWithLimit, checkingjsonLengthLimit.getValue() == 0relies on a floating-point comparison; using a byte-based check (e.g.,jsonLengthLimit.toBytes() == 0) would be more robust and consistent with how the value is subsequently used.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The `toIntExact(jsonLengthLimit.toBytes())` call in `getSerializedMetadataWithLimit` will overflow for values above ~2 GB (e.g., the 5 GB example in `TestTableFinishConfig`), so either cap the allowed size, validate in `TableFinishConfig`/session property, or switch to an API that accepts a `long` to avoid runtime failures.
- Consider validating `tableFinishInfoJsonLengthLimit` in `TableFinishConfig` (e.g., non-negative and within the range that can be safely converted to an `int` in `TableFinishInfo`) to fail fast on misconfiguration rather than later during metadata serialization.
- In `getSerializedMetadataWithLimit`, checking `jsonLengthLimit.getValue() == 0` relies on a floating-point comparison; using a byte-based check (e.g., `jsonLengthLimit.toBytes() == 0`) would be more robust and consistent with how the value is subsequently used.
## Individual Comments
### Comment 1
<location> `presto-main-base/src/main/java/com/facebook/presto/operator/TableFinishInfo.java:119` </location>
<code_context>
+ if (jsonLengthLimit.getValue() == 0) {
+ return Optional.of(INFO_CODEC.toJson(metadataInfo));
+ }
+ return INFO_CODEC.toJsonWithLengthLimit(metadataInfo, toIntExact(jsonLengthLimit.toBytes()));
+ }
}
</code_context>
<issue_to_address>
**issue:** Protect against overflow when converting the configured DataSize to an int length limit.
Now that `jsonLengthLimit` is user-configurable, a very large value could make `jsonLengthLimit.toBytes()` overflow `int` and cause `toIntExact` to throw `ArithmeticException`, turning a bad config into a runtime failure. Consider either clamping the value to `Integer.MAX_VALUE` here, or enforcing a maximum allowed size when parsing the config/session property so this code never receives an unrepresentable value.
</issue_to_address>
### Comment 2
<location> `presto-main-base/src/main/java/com/facebook/presto/operator/TableFinishConfig.java:35-37` </location>
<code_context>
+
+ @Config("table-finish-info-json-length-limit")
+ @ConfigDescription("Maximum size of connector output metadata JSON in table finish info")
+ public TableFinishConfig setTableFinishInfoJsonLengthLimit(DataSize tableFinishInfoJsonLengthLimit)
+ {
+ this.tableFinishInfoJsonLengthLimit = tableFinishInfoJsonLengthLimit;
+ return this;
+ }
</code_context>
<issue_to_address>
**issue:** Validate that the configured JSON length limit is non-negative (and possibly bounded).
This setter allows any `DataSize` value, including negatives. A negative value would propagate to `TableFinishInfo.getSerializedMetadataWithLimit`, where `toBytes()` becomes negative and is passed to `toJsonWithLengthLimit`, which likely expects a non-negative limit. It also conflicts with the "0 disables truncation" convention. Please validate this configuration (e.g., `size >= 0`, and optionally an upper bound) so misconfiguration fails fast at config time rather than at query execution.
</issue_to_address>
### Comment 3
<location> `presto-main-base/src/main/java/com/facebook/presto/SystemSessionProperties.java:569` </location>
<code_context>
true),
+ new PropertyMetadata<>(
+ TABLE_FINISH_INFO_JSON_LENGTH_LIMIT,
+ "Maximum size of connector output metadata JSON in table finish info",
+ VARCHAR,
+ DataSize.class,
</code_context>
<issue_to_address>
**suggestion:** Clarify the "0 disables truncation" behavior in the session property/config descriptions.
The code treats a `DataSize` of 0 as disabling truncation (`getSerializedMetadataWithLimit` skips the limit when `getValue() == 0`), but the session property and `TableFinishConfig` descriptions don’t say this. Please update the description strings to explicitly note that `0` means “no limit” so users can intentionally disable truncation.
Suggested implementation:
```java
new PropertyMetadata<>(
TABLE_FINISH_INFO_JSON_LENGTH_LIMIT,
"Maximum size of connector output metadata JSON in table finish info (0 disables truncation)",
VARCHAR,
DataSize.class,
tableFinishConfig.getTableFinishInfoJsonLengthLimit(),
false,
value -> DataSize.valueOf((String) value),
DataSize::toString),
```
To fully implement your review comment across the codebase, you should also:
1. Update the description of the corresponding config property in `TableFinishConfig` (likely the one backing `getTableFinishInfoJsonLengthLimit()`) to mention that `0` disables truncation / means “no limit”.
2. Ensure any documentation or configuration reference (e.g., `docs/*.md` or configuration property tables) that describes this property uses the updated wording for consistency.
</issue_to_address>
### Comment 4
<location> `presto-main-base/src/test/java/com/facebook/presto/operator/TestTableFinishConfig.java:28` </location>
<code_context>
+import static com.facebook.airlift.units.DataSize.Unit.GIGABYTE;
+import static com.facebook.airlift.units.DataSize.Unit.MEGABYTE;
+
+public class TestTableFinishConfig
+{
+ @Test
</code_context>
<issue_to_address>
**issue (testing):** Add tests that exercise the new session property / TableFinishInfo behavior, not just the config mapping
The new tests only cover `TableFinishConfig` mapping, but not the behavioral changes (configurable JSON length and `0` meaning "no limit"). Please add tests that:
- Build a `TableFinishInfo` with metadata whose JSON slightly exceeds the configured limit and assert `jsonLengthLimitExceeded` is `true` and `serializedConnectorOutputMetadata` is `null`.
- Use the same metadata with a larger limit and assert `jsonLengthLimitExceeded` is `false` and the JSON is present.
- Use `jsonLengthLimit = DataSize.succinctBytes(0)` and large metadata that would previously be truncated, and assert the JSON is fully present and `jsonLengthLimitExceeded` is `false`.
- (Ideally) an integration-style test of `TableFinishOperator` + `SystemSessionProperties` that sets `TABLE_FINISH_INFO_JSON_LENGTH_LIMIT` and verifies the info from `operatorContext.getInfoSupplier()` respects the session property.
That way the configurable limit and `0` = "no limit" semantics are exercised end-to-end, not just the config wiring.
</issue_to_address>
### Comment 5
<location> `presto-main-base/src/test/java/com/facebook/presto/operator/TestTableFinishConfig.java:38-41` </location>
<code_context>
+ }
+
+ @Test
+ public void testExplicitPropertyMappings()
+ {
+ Map<String, String> properties = new ImmutableMap.Builder<String, String>()
+ .put("table-finish-info-json-length-limit", "5GB")
+ .build();
+
</code_context>
<issue_to_address>
**suggestion (testing):** Add an explicit mapping test for the `0` value to cover the "disable limit" behavior
Since `DataSize` value `0` has special meaning in `TableFinishInfo` (disables truncation), please add an assertion that this value is configurable via properties. For example, include a mapping like:
```java
Map<String, String> properties = ImmutableMap.of(
"table-finish-info-json-length-limit", "0B");
TableFinishConfig expected = new TableFinishConfig()
.setTableFinishInfoJsonLengthLimit(DataSize.succinctBytes(0));
assertFullMapping(properties, expected);
```
This will verify the config layer correctly accepts `0` and protect against regressions in the "no limit" behavior.
</issue_to_address>
### Comment 6
<location> `presto-main-base/src/test/java/com/facebook/presto/sql/analyzer/TestAnalyzer.java:248-249` </location>
<code_context>
new TracingConfig(),
new CompilerConfig(),
- new HistoryBasedOptimizationConfig());
+ new HistoryBasedOptimizationConfig(),
+ new TableFinishConfig());
}
</code_context>
<issue_to_address>
**suggestion (testing):** Consider a focused test that validates the default session-level value of `TABLE_FINISH_INFO_JSON_LENGTH_LIMIT`
Right now we only test `TableFinishConfig` itself, not that `SystemSessionProperties` uses its default correctly.
Please add a small test (ideally in a `SystemSessionProperties` test class) that:
- Constructs `SystemSessionProperties` with a `TableFinishConfig` whose `tableFinishInfoJsonLengthLimit` is set to a non-default value.
- Fetches the `PropertyMetadata` for `TABLE_FINISH_INFO_JSON_LENGTH_LIMIT` and asserts that its default matches the configured `DataSize`.
This will verify the wiring between the config and the session property and prevent regressions if they drift apart.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
presto-main-base/src/main/java/com/facebook/presto/operator/TableFinishInfo.java
Outdated
Show resolved
Hide resolved
presto-main-base/src/main/java/com/facebook/presto/operator/TableFinishConfig.java
Outdated
Show resolved
Hide resolved
|
|
||
| import static com.facebook.airlift.units.DataSize.Unit.MEGABYTE; | ||
|
|
||
| public class TableFinishConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find a better config file to put this config in, so I made my own. I'm open to suggestions on where else to put this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will say, either FeaturesConfig, or make the name of this config general to incorporate similar needs in other operators?
Summary: ## Problem Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information ## Solution - Make the length limit configurable via a Session Property. - Make the default value of the session property configurable at the cluster-level. - When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO) Differential Revision: D87825974
de09bca to
5c717d6
Compare
Summary: ## Problem Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information ## Solution - Make the length limit configurable via a Session Property. - Make the default value of the session property configurable at the cluster-level. - When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO) Differential Revision: D87825974
5c717d6 to
5df0c26
Compare
|
Please add documentation for the new system Session Property |
Summary: ## Problem Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information ## Solution - Make the length limit configurable via a Session Property. - Make the default value of the session property configurable at the cluster-level. - When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO) Differential Revision: D87825974
5df0c26 to
04e9ab3
Compare
| ========================= | ||
|
|
||
| This section describes session properties that may be used to tune | ||
| This section describes session properties that may be used to tune |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These whitespace changes are being made automatically by my editor.
"might as well" clean up these white space issues. I created a separate PR for that: 26736
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the doc! Just a couple of nits.
|
|
||
| Materialized views are experimental. The SPI and behavior may change in future releases. | ||
|
|
||
| ``table_finish_info_json_length_limi`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ``table_finish_info_json_length_limi`` | |
| ``table_finish_info_json_length_limi`` |
_limi or _limit?
| Materialized views are experimental. The SPI and behavior may change in future releases. | ||
|
|
||
| ``table_finish_info_json_length_limi`` | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please edit this formatting line to the exact number of characters in the line being formatted above it, to avoid the following showing up in doc build output:
/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/admin/properties-session.rst:586: WARNING: Title underline too short.
``table_finish_info_json_length_limi``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [docutils]
/Users/steveburnett/Documents/GitHub/presto/presto-docs/src/main/sphinx/admin/properties-session.rst:586: WARNING: Title underline too short.
``table_finish_info_json_length_limi``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [docutils]
Summary: ## Problem Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information ## Solution - Make the length limit configurable via a Session Property. - Make the default value of the session property configurable at the cluster-level. - When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO) Differential Revision: D87825974
04e9ab3 to
dcc977c
Compare
|
Nit: the formatting of the Release Notes section of the Description appears to not pass the CI check https://github.com/prestodb/presto/actions/runs/19903784846/job/57054525017?pr=26731 Please edit the Release Notes section. |
steveburnett
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (docs)
Pull updated branch, new local doc build. Thanks!
Summary: ## Problem Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information ## Solution - Make the length limit configurable via a Session Property. - Make the default value of the session property configurable at the cluster-level. - When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO) Differential Revision: D87825974
dcc977c to
e766afe
Compare
Summary: ## Problem Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information ## Solution - Make the length limit configurable via a Session Property. - Make the default value of the session property configurable at the cluster-level. - When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO) Differential Revision: D87825974
e766afe to
2c3be64
Compare
Summary: ## Problem Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information ## Solution - Make the length limit configurable via a Session Property. - Make the default value of the session property configurable at the cluster-level. - When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO) Differential Revision: D87825974
2c3be64 to
572a925
Compare
Summary: ## Problem Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information ## Solution - Make the length limit configurable via a Session Property. - Make the default value of the session property configurable at the cluster-level. - When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO) Differential Revision: D87825974
572a925 to
6b1a599
Compare
Summary: ## Problem Currently, `TableFinishInfo` hard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely upon `TableFinishInfo` containing complete information ## Solution - Make the length limit configurable via a Session Property. - Make the default value of the session property configurable at the cluster-level. - When the length limit is set to 0, entirely bypass enforcing a size limit. (This is slightly awkward since users still need to set arbitrary units for the `DataSize`, but this is acceptable IMO) Differential Revision: D87825974
6b1a599 to
1830d55
Compare
| "Optimize scale writer creation based on producer buffer", | ||
| featuresConfig.isOptimizedScaleWriterProducerBuffer(), | ||
| true), | ||
| integerProperty( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using dataSizeProperty, like QUERY_MAX_WRITTEN_INTERMEDIATE_BYTES ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I debated doing this! I think integer is actually better:
-
Strictly speaking, we pass this down
NFO_CODEC.toJsonWithLengthLimit(metadataInfo, jsonLengthLimit);.toJsonWithLengthLimitchecks the number of characters, not the number of bytes. That is, integer is technically "more correct" -
I want to support disabling the feature when passing a non-positive value.
DataSizerequires you specify units (e.g.0MB,0GB), which is awkward if you just want to disable the feature.
Differential Revision: D87825974
Problem
Currently,
TableFinishInfohard-codes a 10 MB limit for serialized metadata info. If that size limit is exceeded, data is dropped. This cannot be configured, and is problematic for users who rely uponTableFinishInfocontaining complete informationSolution
Reviewer Notes
I couldn't find a good config file to put the default Table Finish JSON length limit, so I created a new config file:
TableFinishConfig.This slightly increased the scope of this change, because I needed to pass this config in all tests which explicitly construct
SystemSessionProperties.