Skip to content

Add function metadata ability to push down struct argument in optimizer #25175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

kevintang2022
Copy link
Contributor

@kevintang2022 kevintang2022 commented May 22, 2025

Summary:
For some user defined functions, the pushdown subfield optimizer should transparently pass down utilized subfields of a struct type. The goal is to make the query plan look the same as if the udf was not being called on the struct. In order to accomplish this, the user defined function needs to take the struct argument passed into it, and unwrap it when converting an expression to a subfield.

Since there is no guarantee that the struct argument is always the first argument in the udf, the udf needs to specify which argument index to push down in its metadata.

T224244100

Presto version 0.293-20250521.210824-350

Differential Revision: D74738214

Test plan:
With this change, both of the queries below produce the same query plan after the table scan node rewrite

explain with shaped as (SELECT fb_reshape_row(person,CAST(NULL AS ROW(age INTEGER, city VARCHAR))) AS pcol FROM tangk_struct_table),
raw as (select person as pcol from tangk_struct_table)
select pcol.age from raw;
explain with shaped as (SELECT fb_reshape_row(person,CAST(NULL AS ROW(age INTEGER, city VARCHAR))) AS pcol FROM tangk_struct_table),
raw as (select person as pcol from tangk_struct_table)
select pcol.age from shaped;

0.293-20250517.231738-323 (pushdown subfields test version)
20250517_232920_00003_kcp7e correct query plan with pushed down subfield
20250517_233012_00005_kcp7e query plan with non relevant function
20250517_233034_00006_kcp7e expected plan

Verifier suite build: 20250521_205359_71488_cm4iz

pt suite build --predicate "lower(query) like '%fb_reshape_row%'" --suite atn_fb_reshape_row_subfields_udf --region atn --days 100

UDF only
https://www.internalfb.com/intern/presto/verifier/results/?test_id=223902
General
https://our.intern.facebook.com/intern/presto/verifier/results/?test_id=223903

Summary:
T224244100
For fb_reshape_row udf, Presto should pushdown the utilized subfields of a struct during the optimizer phase.

Presto version 0.293-20250521.210824-350

Differential Revision: D74738214
@kevintang2022 kevintang2022 requested a review from a team as a code owner May 22, 2025 17:10
@facebook-github-bot
Copy link
Collaborator

This pull request was exported from Phabricator. Differential Revision: D74738214

@kevintang2022 kevintang2022 requested a review from rschlussel May 22, 2025 17:22
@kevintang2022 kevintang2022 changed the title fb_reshape_row always push down subfields Add function metadata ability to push down struct argument in optimizer May 22, 2025
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Can you add a test? Maybe to TestHiveLogicalPlanner that the subfields are being pushed down and then a query correctness test (see TestLambdaSubfieldPruning for examples of tests for a related feature). You may have to register a function in the test that uses the new field you added (e.g. a passthrough function that takes a row and returns the row unchanged) to exercise your code.

@@ -33,4 +33,6 @@
boolean deterministic() default true;

boolean calledOnNullInput() default false;

int pushdownSubfieldArgIndex() default -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This interface method, and the ones on ScalarFunction and SqlInvokedScalarFunction, are parameters to an annotation. Does a user implementing a function using the Presto SPI manually specify this value in the annotation? If not, then these should be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they are specified in the annotation. If it is not specified, then it is -1 by default

Annotation looks like this

@CodegenScalarFunction(value = "function_name", calledOnNullInput = true, pushdownSubfieldArgIndex = 0)

Copy link
Contributor

@tdcmeehan tdcmeehan May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some documentation on how this needs to be used to our documentation?

Also, why not add an annotation to the argument itself? Something like, @RowMayBeDereferenced. (BTW, can you give an example of how a function could know that this is safe to do?) You could annotate multiple of them, and we we could validate that the argument is, indeed, a struct to begin with.

Also, your example reference internal queries, can you add pastes of the explain plans?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add some tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using annotation on the argument, I have chosen to pass the argindex in the codegen decorator because this allows it to be inside the FunctionMetadata.

I added some tests to TestHiveLogicalPlanner. One more change I will make is to perform some validation that the argIndex specified does correspond to a rowtype. And throw a warning when the code path is not reached due to invalid index

@steveburnett
Copy link
Contributor

Suggest adding a release note, or a NO RELEASE NOTE block, as appropriate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants