Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation: Plan custom expressions #15353

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Jiashu-Hu
Copy link
Contributor

@Jiashu-Hu Jiashu-Hu commented Mar 21, 2025

Which issue does this PR close?

Rationale for this change

This PR adds documentation for using the ExprPlanner API to plan custom expressions, as requested in #15267. Clear documentation with an example will help users extend DataFusion to support custom operators like PostgreSQL's ->.

What changes are included in this PR?

  • Added a new section ## Custom Expression Planning to document
  • Included an introduction explaining what ExprPlanner does and why it’s useful.
  • Provided a step-by-step example showing how to implement and register an ExprPlanner to make the -> operator concatenate strings (e.g., 'foo'->'bar' outputs foobar).
  • Linked to relevant API docs (ExprPlanner and FunctionRegistry) for further reading.

Are these changes tested?

yes

Are there any user-facing changes?

Yes, since this is an documentation chang

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Mar 21, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much @Jiashu-Hu -- this looks great.

I have some suggestions on how to improve this section and example, but we could also do it as a follow on PR.

It is great to see the docs being improved

@@ -1160,6 +1160,89 @@ async fn main() -> Result<()> {
// +---+
```

## Custom Expression Planning

DataFusion provides native support for a limited set of SQL operators by default. For operators not natively defined, developers can extend DataFusion's functionality by implementing custom expression planning. This extensibility is a core feature of DataFusion, allowing it to be customized for particular workloads and requirements.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some wordsmithing suggestions:

Suggested change
DataFusion provides native support for a limited set of SQL operators by default. For operators not natively defined, developers can extend DataFusion's functionality by implementing custom expression planning. This extensibility is a core feature of DataFusion, allowing it to be customized for particular workloads and requirements.
DataFusion provides native support for common SQL operators by default such as `+`, `-`, `||`. However it does not provide support for other operators such as `@>`. To override DataFusion's default handling or support unsupported operators, developers can extend DataFusion by implementing custom expression planning, a core feature of DataFusion


1. Implement the `ExprPlanner` trait: This allows you to define custom logic for planning expressions that DataFusion doesn't natively recognize. The trait provides the necessary interface to translate logical expressions into physical execution plans.

For a detailed documentation please see: [Trait ExprPlanner](https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.ExprPlanner.html)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For a detailed documentation please see: [Trait ExprPlanner](https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.ExprPlanner.html)
For detailed documentation please see: [Trait ExprPlanner](https://docs.rs/datafusion/latest/datafusion/logical_expr/planner/trait.ExprPlanner.html)


To extend DataFusion with support for custom operators not natively available, you need to:

1. Implement the `ExprPlanner` trait: This allows you to define custom logic for planning expressions that DataFusion doesn't natively recognize. The trait provides the necessary interface to translate logical expressions into physical execution plans.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ExprPlanner actually converts the SQL AST to Exprs:

Suggested change
1. Implement the `ExprPlanner` trait: This allows you to define custom logic for planning expressions that DataFusion doesn't natively recognize. The trait provides the necessary interface to translate logical expressions into physical execution plans.
1. Implement the `ExprPlanner` trait: This allows you to define custom logic for planning expressions that DataFusion doesn't natively recognize. The trait provides the necessary interface to translate SQL AST nodes into logical `Expr`.


2. Register your custom planner: Integrate your implementation with DataFusion's `SessionContext` to ensure your custom planning logic is invoked during the query optimization and execution planning phase.

For a detailed documentation please see: [fn register_expr_planner](https://docs.rs/datafusion/latest/datafusion/execution/trait.FunctionRegistry.html#method.register_expr_planner)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
For a detailed documentation please see: [fn register_expr_planner](https://docs.rs/datafusion/latest/datafusion/execution/trait.FunctionRegistry.html#method.register_expr_planner)
For a detailed documentation see: [fn register_expr_planner](https://docs.rs/datafusion/latest/datafusion/execution/trait.FunctionRegistry.html#method.register_expr_planner)

# // Define the custom planner
# struct MyCustomPlanner;

// Implement ExprPlanner for cutom operator logic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Implement ExprPlanner for cutom operator logic
// Implement ExprPlanner to add support for the `->` custom operator

ctx.register_expr_planner(Arc::new(MyCustomPlanner))?;
let results = ctx.sql("select 'foo'->'bar';").await?.collect().await?;

pretty::print_batches(&results)?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please change this to use `assert_batches_eq! so the actual output is in the test and it is tested in CI

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add documentation about how to plan custom expressions
2 participants