Skip to content

[SPARK-51439][SQL] Support SQL UDF with DEFAULT argument #50408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 29 commits into from

Conversation

wengh
Copy link
Contributor

@wengh wengh commented Mar 26, 2025

Continuing @allisonwang-db's work on #50373 and #49471

What changes were proposed in this pull request?

This PR adds support for DEFAULT arguments in SQL UDF. Examples:

CREATE FUNCTION foo1d1(a INT DEFAULT 10) RETURNS INT RETURN a;
SELECT foo1d1();   -- 10
SELECT foo1d1(20); -- 20

CREATE FUNCTION foo1d6(a INT, b INT DEFAULT 7) RETURNS TABLE(a INT, b INT) RETURN SELECT a, b;
SELECT * FROM foo1d6(5);    -- 5, 7
SELECT * FROM foo1d6(5, 2); -- 5, 2

See sql-udf.sql for more valid and invalid examples.

Why are the changes needed?

To support default arguments in SQL UDFs.

Does this PR introduce any user-facing change?

Yes. Now SQL UDFs support DEFAULT arguments.

A side effect of the grammar change is that some invalid function parameter definitions are now no longer rejected by the grammar, but instead rejected by the parser logic.

Examples:

-- multiple COMMENT or multiple NOT NULL
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a;

-- before:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'COMMENT'. SQLSTATE: 42601
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-- after:
[CREATE_TABLE_COLUMN_DESCRIPTOR_DUPLICATE] CREATE TABLE column a specifies descriptor "COMMENT" more than once, which is invalid. SQLSTATE: 42710
== SQL (line 1, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world')...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
-- GENERATED ALWAYS AS
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;

-- before:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'GENERATED'. SQLSTATE: 42601
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-- after:
[INVALID_SQL_SYNTAX.CREATE_FUNC_WITH_GENERATED_COLUMNS_AS_PARAMETERS] Invalid SQL syntax: CREATE FUNCTION with generated columns as parameters is not allowed. SQLSTATE: 42000
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

This doesn't change the behavior of existing valid SQL.

How was this patch tested?

End-to-end regression tests in sql-udf.sql and simple tests in SQLFunctionSuite.

Was this patch authored or co-authored using generative AI tooling?

No

@wengh wengh changed the title [WIP][SPARK-51439] Support SQL UDF with DEFAULT argument [WIP][SPARK-51439][SQL] Support SQL UDF with DEFAULT argument Mar 26, 2025
@wengh wengh marked this pull request as ready for review March 27, 2025 15:05
@wengh wengh changed the title [WIP][SPARK-51439][SQL] Support SQL UDF with DEFAULT argument [SPARK-51439][SQL] Support SQL UDF with DEFAULT argument Mar 27, 2025
@wengh
Copy link
Contributor Author

wengh commented Mar 27, 2025

@wengh wengh force-pushed the sql-udf-default branch from ecfb9ce to aee7338 Compare March 27, 2025 16:01
Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@wengh wengh marked this pull request as draft March 28, 2025 00:31
@wengh wengh force-pushed the sql-udf-default branch from 8f51e83 to 8c73a14 Compare March 28, 2025 16:48
@wengh wengh marked this pull request as ready for review March 28, 2025 17:13
Copy link
Contributor Author

@wengh wengh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explain refactor

@wengh wengh requested review from cloud-fan and zhengruifeng April 1, 2025 00:22
@wengh wengh requested a review from cloud-fan April 2, 2025 23:03
@wengh wengh force-pushed the sql-udf-default branch from 275f637 to e4a6883 Compare April 8, 2025 21:51
Copy link
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if CI passes

@@ -521,6 +521,7 @@ case class StructType(fields: Array[StructField]) extends DataType with Seq[Stru
*/
@Stable
object StructType extends AbstractDataType {
private[sql] val SQL_FUNCTION_DEFAULT_METADATA_KEY = "spark.sql.function.default"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just keep 'default'?

Copy link
Contributor

@allisonwang-db allisonwang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

private def getDDLDefault = getCurrentDefaultValue()
.map(" DEFAULT " + _)
.getOrElse("")
private def getDDLDefault =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we keep the original ordering? getCurrentDefaultValue().orElse(getParameterDefault())?

Copy link
Contributor Author

@wengh wengh Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ordering shouldn't matter because they are not set at the same time

*/
private[sql] def getDefault(): Option[String] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep the original function name if possible? I think the docstring itself makes it clear it's for function parameter. It would be good to reduce cosmetic changes given the PR is already large.

@@ -74,4 +74,17 @@ class SQLFunctionSuite extends QueryTest with SharedSparkSession {
|""".stripMargin), Seq(Row(2), Row(4)))
}
}

test("SQL scalar function with default value") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add another test with SQL table functions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is for convenience when debugging. The SQL tests already check that it works with table functions.

@cloud-fan
Copy link
Contributor

@wengh can you re-trigger the Github Action jobs? The link failure seems flaky.

@wengh
Copy link
Contributor Author

wengh commented Apr 23, 2025

@cloud-fan thanks for the reminder. tests are passing now.

@cloud-fan
Copy link
Contributor

thanks, merging to master/4.0 (to make the SQL UDF feature complete)!

@cloud-fan cloud-fan closed this in 6df5cb7 Apr 24, 2025
cloud-fan pushed a commit that referenced this pull request Apr 24, 2025
Continuing allisonwang-db's work on #50373 and #49471

This PR adds support for DEFAULT arguments in SQL UDF. Examples:
```sql
CREATE FUNCTION foo1d1(a INT DEFAULT 10) RETURNS INT RETURN a;
SELECT foo1d1();   -- 10
SELECT foo1d1(20); -- 20

CREATE FUNCTION foo1d6(a INT, b INT DEFAULT 7) RETURNS TABLE(a INT, b INT) RETURN SELECT a, b;
SELECT * FROM foo1d6(5);    -- 5, 7
SELECT * FROM foo1d6(5, 2); -- 5, 2
```

See sql-udf.sql for more valid and invalid examples.

To support default arguments in SQL UDFs.

Yes. Now SQL UDFs support DEFAULT arguments.

A side effect of the grammar change is that some invalid function parameter definitions are now no longer rejected by the grammar, but instead rejected by the parser logic.

Examples:

```sql
-- multiple COMMENT or multiple NOT NULL
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a;

-- before:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'COMMENT'. SQLSTATE: 42601
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-- after:
[CREATE_TABLE_COLUMN_DESCRIPTOR_DUPLICATE] CREATE TABLE column a specifies descriptor "COMMENT" more than once, which is invalid. SQLSTATE: 42710
== SQL (line 1, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world')...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

```sql
-- GENERATED ALWAYS AS
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;

-- before:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'GENERATED'. SQLSTATE: 42601
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-- after:
[INVALID_SQL_SYNTAX.CREATE_FUNC_WITH_GENERATED_COLUMNS_AS_PARAMETERS] Invalid SQL syntax: CREATE FUNCTION with generated columns as parameters is not allowed. SQLSTATE: 42000
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

This doesn't change the behavior of existing valid SQL.

End-to-end regression tests in `sql-udf.sql` and simple tests in `SQLFunctionSuite`.

No

Closes #50408 from wengh/sql-udf-default.

Lead-authored-by: Haoyu Weng <[email protected]>
Co-authored-by: Allison Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Kimahriman pushed a commit to Kimahriman/spark that referenced this pull request May 13, 2025
Continuing allisonwang-db's work on apache#50373 and apache#49471

### What changes were proposed in this pull request?

This PR adds support for DEFAULT arguments in SQL UDF. Examples:
```sql
CREATE FUNCTION foo1d1(a INT DEFAULT 10) RETURNS INT RETURN a;
SELECT foo1d1();   -- 10
SELECT foo1d1(20); -- 20

CREATE FUNCTION foo1d6(a INT, b INT DEFAULT 7) RETURNS TABLE(a INT, b INT) RETURN SELECT a, b;
SELECT * FROM foo1d6(5);    -- 5, 7
SELECT * FROM foo1d6(5, 2); -- 5, 2
```

See sql-udf.sql for more valid and invalid examples.

### Why are the changes needed?

To support default arguments in SQL UDFs.

### Does this PR introduce _any_ user-facing change?

Yes. Now SQL UDFs support DEFAULT arguments.

A side effect of the grammar change is that some invalid function parameter definitions are now no longer rejected by the grammar, but instead rejected by the parser logic.

Examples:

```sql
-- multiple COMMENT or multiple NOT NULL
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a;

-- before:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'COMMENT'. SQLSTATE: 42601
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world') RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-- after:
[CREATE_TABLE_COLUMN_DESCRIPTOR_DUPLICATE] CREATE TABLE column a specifies descriptor "COMMENT" more than once, which is invalid. SQLSTATE: 42710
== SQL (line 1, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT COMMENT 'hello' COMMENT 'world')...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

```sql
-- GENERATED ALWAYS AS
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;

-- before:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'GENERATED'. SQLSTATE: 42601
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

-- after:
[INVALID_SQL_SYNTAX.CREATE_FUNC_WITH_GENERATED_COLUMNS_AS_PARAMETERS] Invalid SQL syntax: CREATE FUNCTION with generated columns as parameters is not allowed. SQLSTATE: 42000
== SQL (line 2, position 1) ==
CREATE TEMPORARY FUNCTION foo(a INT GENERATED ALWAYS AS (1)) RETURNS INT RETURN a;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
```

This doesn't change the behavior of existing valid SQL.

### How was this patch tested?

End-to-end regression tests in `sql-udf.sql` and simple tests in `SQLFunctionSuite`.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#50408 from wengh/sql-udf-default.

Lead-authored-by: Haoyu Weng <[email protected]>
Co-authored-by: Allison Wang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants