-
Notifications
You must be signed in to change notification settings - Fork 28.7k
[SPARK-51439][SQL] Support SQL UDF with DEFAULT argument #50408
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@cloud-fan @allisonwang-db Please take a look at this follow up of [SPARK-50763][SQL] Add Analyzer rule for resolving SQL table functions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserInterface.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBaseParser.g4
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain refactor
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserUtils.scala
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/util/SparkParserUtils.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/UserDefinedFunction.scala
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/DataTypeAstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/catalyst/parser/parsers.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/ColumnDefinition.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala
Outdated
Show resolved
Hide resolved
sql/api/src/main/scala/org/apache/spark/sql/types/StructField.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AbstractSqlParser.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
Outdated
Show resolved
Hide resolved
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if CI passes
@@ -521,6 +521,7 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru | |||
*/ | |||
@Stable | |||
object StructType extends AbstractDataType { | |||
private[sql] val SQL_FUNCTION_DEFAULT_METADATA_KEY = "spark.sql.function.default" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just keep 'default'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
private def getDDLDefault = getCurrentDefaultValue() | ||
.map(" DEFAULT " + _) | ||
.getOrElse("") | ||
private def getDDLDefault = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we keep the original ordering? getCurrentDefaultValue().orElse(getParameterDefault())?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ordering shouldn't matter because they are not set at the same time
*/ | ||
private[sql] def getDefault(): Option[String] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the original function name if possible? I think the docstring itself makes it clear it's for function parameter. It would be good to reduce cosmetic changes given the PR is already large.
@@ -74,4 +74,17 @@ class SQLFunctionSuite extends QueryTest with SharedSparkSession { | |||
|""".stripMargin), Seq(Row(2), Row(4))) | |||
} | |||
} | |||
|
|||
test("SQL scalar function with default value") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add another test with SQL table functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is for convenience when debugging. The SQL tests already check that it works with table functions.
@wengh can you re-trigger the Github Action jobs? The link failure seems flaky. |
@cloud-fan thanks for the reminder. tests are passing now. |
thanks, merging to master/4.0 (to make the SQL UDF feature complete)! |
Continuing allisonwang-db's work on #50373 and #49471 This PR adds support for DEFAULT arguments in SQL UDF. Examples: ```sql CREATE FUNCTION foo1d1(a INT DEFAULT 10) RETURNS INT RETURN a; SELECT foo1d1(); -- 10 SELECT foo1d1(20); -- 20 CREATE FUNCTION foo1d6(a INT, b INT DEFAULT 7) RETURNS TABLE(a INT, b INT) RETURN SELECT a, b; SELECT * FROM foo1d6(5); -- 5, 7 SELECT * FROM foo1d6(5, 2); -- 5, 2 ``` See sql-udf.sql for more valid and invalid examples. To support default arguments in SQL UDFs. Yes. Now SQL UDFs support DEFAULT arguments. A side effect of the grammar change is that some invalid function parameter definitions are now no longer rejected by the grammar, but instead rejected by the parser logic. Examples: ```sql -- multiple COMMENT or multiple NOT NULL CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a; -- before: [PARSE_SYNTAX_ERROR] Syntax error at or near 'COMMENT'. SQLSTATE: 42601 == SQL (line 2, position 1) == CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- after: [CREATE_TABLE_COLUMN_DESCRIPTOR_DUPLICATE] CREATE TABLE column a specifies descriptor "COMMENT" more than once, which is invalid. SQLSTATE: 42710 == SQL (line 1, position 1) == CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world')... ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` ```sql -- GENERATED ALWAYS AS CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a; -- before: [PARSE_SYNTAX_ERROR] Syntax error at or near 'GENERATED'. SQLSTATE: 42601 == SQL (line 2, position 1) == CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- after: [INVALID_SQL_SYNTAX.CREATE_FUNC_WITH_GENERATED_COLUMNS_AS_PARAMETERS] Invalid SQL syntax: CREATE FUNCTION with generated columns as parameters is not allowed. SQLSTATE: 42000 == SQL (line 2, position 1) == CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` This doesn't change the behavior of existing valid SQL. End-to-end regression tests in `sql-udf.sql` and simple tests in `SQLFunctionSuite`. No Closes #50408 from wengh/sql-udf-default. Lead-authored-by: Haoyu Weng <[email protected]> Co-authored-by: Allison Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
Continuing allisonwang-db's work on apache#50373 and apache#49471 ### What changes were proposed in this pull request? This PR adds support for DEFAULT arguments in SQL UDF. Examples: ```sql CREATE FUNCTION foo1d1(a INT DEFAULT 10) RETURNS INT RETURN a; SELECT foo1d1(); -- 10 SELECT foo1d1(20); -- 20 CREATE FUNCTION foo1d6(a INT, b INT DEFAULT 7) RETURNS TABLE(a INT, b INT) RETURN SELECT a, b; SELECT * FROM foo1d6(5); -- 5, 7 SELECT * FROM foo1d6(5, 2); -- 5, 2 ``` See sql-udf.sql for more valid and invalid examples. ### Why are the changes needed? To support default arguments in SQL UDFs. ### Does this PR introduce _any_ user-facing change? Yes. Now SQL UDFs support DEFAULT arguments. A side effect of the grammar change is that some invalid function parameter definitions are now no longer rejected by the grammar, but instead rejected by the parser logic. Examples: ```sql -- multiple COMMENT or multiple NOT NULL CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a; -- before: [PARSE_SYNTAX_ERROR] Syntax error at or near 'COMMENT'. SQLSTATE: 42601 == SQL (line 2, position 1) == CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- after: [CREATE_TABLE_COLUMN_DESCRIPTOR_DUPLICATE] CREATE TABLE column a specifies descriptor "COMMENT" more than once, which is invalid. SQLSTATE: 42710 == SQL (line 1, position 1) == CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world')... ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` ```sql -- GENERATED ALWAYS AS CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a; -- before: [PARSE_SYNTAX_ERROR] Syntax error at or near 'GENERATED'. SQLSTATE: 42601 == SQL (line 2, position 1) == CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -- after: [INVALID_SQL_SYNTAX.CREATE_FUNC_WITH_GENERATED_COLUMNS_AS_PARAMETERS] Invalid SQL syntax: CREATE FUNCTION with generated columns as parameters is not allowed. SQLSTATE: 42000 == SQL (line 2, position 1) == CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a; ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ``` This doesn't change the behavior of existing valid SQL. ### How was this patch tested? End-to-end regression tests in `sql-udf.sql` and simple tests in `SQLFunctionSuite`. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50408 from wengh/sql-udf-default. Lead-authored-by: Haoyu Weng <[email protected]> Co-authored-by: Allison Wang <[email protected]> Signed-off-by: Wenchen Fan <[email protected]>
Continuing @allisonwang-db's work on #50373 and #49471
What changes were proposed in this pull request?
This PR adds support for DEFAULT arguments in SQL UDF. Examples:
See sql-udf.sql for more valid and invalid examples.
Why are the changes needed?
To support default arguments in SQL UDFs.
Does this PR introduce any user-facing change?
Yes. Now SQL UDFs support DEFAULT arguments.
A side effect of the grammar change is that some invalid function parameter definitions are now no longer rejected by the grammar, but instead rejected by the parser logic.
Examples:
This doesn't change the behavior of existing valid SQL.
How was this patch tested?
End-to-end regression tests in
sql-udf.sql
and simple tests inSQLFunctionSuite
.Was this patch authored or co-authored using generative AI tooling?
No