-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Add function metadata ability to push down struct argument in optimizer #25175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Summary: T224244100 For fb_reshape_row udf, Presto should pushdown the utilized subfields of a struct during the optimizer phase. Presto version 0.293-20250521.210824-350 Differential Revision: D74738214
This pull request was exported from Phabricator. Differential Revision: D74738214 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Can you add a test? Maybe to TestHiveLogicalPlanner that the subfields are being pushed down and then a query correctness test (see TestLambdaSubfieldPruning for examples of tests for a related feature). You may have to register a function in the test that uses the new field you added (e.g. a passthrough function that takes a row and returns the row unchanged) to exercise your code.
@@ -33,4 +33,6 @@ | |||
boolean deterministic() default true; | |||
|
|||
boolean calledOnNullInput() default false; | |||
|
|||
int pushdownSubfieldArgIndex() default -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface method, and the ones on ScalarFunction
and SqlInvokedScalarFunction
, are parameters to an annotation. Does a user implementing a function using the Presto SPI manually specify this value in the annotation? If not, then these should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they are specified in the annotation. If it is not specified, then it is -1 by default
Annotation looks like this
@CodegenScalarFunction(value = "function_name", calledOnNullInput = true, pushdownSubfieldArgIndex = 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some documentation on how this needs to be used to our documentation?
Also, why not add an annotation to the argument itself? Something like, @RowMayBeDereferenced
. (BTW, can you give an example of how a function could know that this is safe to do?) You could annotate multiple of them, and we we could validate that the argument is, indeed, a struct to begin with.
Also, your example reference internal queries, can you add pastes of the explain plans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add some tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using annotation on the argument, I have chosen to pass the argindex in the codegen decorator because this allows it to be inside the FunctionMetadata.
I added some tests to TestHiveLogicalPlanner. One more change I will make is to perform some validation that the argIndex specified does correspond to a rowtype. And throw a warning when the code path is not reached due to invalid index
Suggest adding a release note, or a |
Summary:
For some user defined functions, the pushdown subfield optimizer should transparently pass down utilized subfields of a struct type. The goal is to make the query plan look the same as if the udf was not being called on the struct. In order to accomplish this, the user defined function needs to take the struct argument passed into it, and unwrap it when converting an expression to a subfield.
Since there is no guarantee that the struct argument is always the first argument in the udf, the udf needs to specify which argument index to push down in its metadata.
T224244100
Presto version 0.293-20250521.210824-350
Differential Revision: D74738214
Test plan:
With this change, both of the queries below produce the same query plan after the table scan node rewrite
0.293-20250517.231738-323 (pushdown subfields test version)
20250517_232920_00003_kcp7e correct query plan with pushed down subfield
20250517_233012_00005_kcp7e query plan with non relevant function
20250517_233034_00006_kcp7e expected plan
Verifier suite build: 20250521_205359_71488_cm4iz
UDF only
https://www.internalfb.com/intern/presto/verifier/results/?test_id=223902
General
https://our.intern.facebook.com/intern/presto/verifier/results/?test_id=223903