-
Notifications
You must be signed in to change notification settings - Fork 1.6k
feat: [datafusion-spark] Implement next_day
function
#16780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
next_day
function
impl SparkNextDay { | ||
pub fn new() -> Self { | ||
Self { | ||
signature: Signature::user_defined(Volatility::Immutable), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can define a specific signature here to be (Date32, Utf8/Utf8View/LargeUtf8)
After that I think the implementation can be simplified
- No need to implement
coerce_types()
, there are code to handle that automatically based on the signature. - We can assume the signature is valid inside
invoke_with_args()
, so there would be no need to check invalid input (sanity checks likeunreachable!()
or returning internal errors for invalid input can still be applied)
@@ -23,5 +23,17 @@ | |||
|
|||
## Original Query: SELECT next_day('2015-01-14', 'TU'); | |||
## PySpark 3.5.5 Result: {'next_day(2015-01-14, TU)': datetime.date(2015, 1, 20), 'typeof(next_day(2015-01-14, TU))': 'date', 'typeof(2015-01-14)': 'string', 'typeof(TU)': 'string'} | |||
#query | |||
#SELECT next_day('2015-01-14'::string, 'TU'::string); | |||
query D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend to add tests for invalid inputs:
- 0 or >2 inputs
- Each element can be either valid input, invalid input of correct type like
2015-13-32
, or invalid types, and finally nulls. We want to test different combinations, to ensure for invalid inputs, the expected (and easy-to-understand) errors are returned, instead of panicking.
Also here we only checked ScalarValue()
input, let's also do the tests for Array
inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Add support for Spark’s next_day
function in DataFusion by implementing the UDF and its tests, registering it in the datetime module, and adding chrono
as a dependency.
- Introduced SQLLogicTest cases for
next_day
- Implemented
SparkNextDay
UDF (scalar + array) - Registered the UDF in
mod.rs
and updatedCargo.toml
Reviewed Changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
next_day.slt | Added functional tests for next_day with various inputs |
next_day.rs | Full implementation of next_day UDF logic |
mod.rs | Registered and exported next_day in datetime module |
Cargo.toml | Added chrono as a workspace dependency |
Comments suppressed due to low confidence (3)
datafusion/spark/src/function/datetime/next_day.rs:77
- The code only handles
Date32
inputs for the date argument but the tests pass string dates. You need to add a branch to parseScalarValue::Utf8
/LargeUtf8
as ISO-8601 dates and convert them toDate32
before computing the next day.
(ColumnarValue::Scalar(date), ColumnarValue::Scalar(day_of_week)) => {
datafusion/sqllogictest/test_files/spark/datetime/next_day.slt:32
- Consider adding tests for edge cases such as NULL inputs and invalid weekday strings to verify null propagation and error handling behavior.
SELECT next_day('2015-07-27'::string, 'Sun'::string);
datafusion/spark/Cargo.toml:40
- The syntax for adding a workspace dependency is incorrect. Change to
chrono = { workspace = true }
to match the other entries.
chrono.workspace = true
fn spark_next_day(days: i32, day_of_week: &str) -> Option<i32> { | ||
let date = Date32Type::to_naive_date(days); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The spark_next_day
function recomputes trim().to_uppercase()
and parses the weekday for each element in an array. You could pre-normalize and parse the target Weekday
once outside loops for better performance on large arrays.
fn spark_next_day(days: i32, day_of_week: &str) -> Option<i32> { | |
let date = Date32Type::to_naive_date(days); | |
fn spark_next_day_with_weekday(days: i32, day_of_week: Weekday) -> Option<i32> { | |
let date = Date32Type::to_naive_date(days); | |
Some(Date32Type::from_naive_date( | |
date + Duration::days( | |
(7 - date.weekday().days_since(day_of_week)) as i64, | |
), | |
)) | |
} | |
fn normalize_and_parse_weekday(day_of_week: &str) -> Option<Weekday> { |
Copilot uses AI. Check for mistakes.
|
||
export_functions!(( | ||
next_day, | ||
"Returns the first date which is later than start_date and named as indicated. The function returns NULL if at least one of the input parameters is NULL. When both of the input parameters are not NULL and day_of_week is an invalid input, the function throws SparkIllegalArgumentException if spark.sql.ansi.enabled is set to true, otherwise NULL.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs to be adjusted. Rust does not have exceptions and ansi mode is not hooked up yet (might need something like #16661 for that to happen)
Co-authored-by: Bruce Ritchie <[email protected]>
Which issue does this PR close?
date
functionnext_day
#16775Rationale for this change
See #16775
What changes are included in this PR?
Implement spark-compatible
next_day
functionAre these changes tested?
Yes, I added tests from all of the links in the Spark Test Files README.md
Are there any user-facing changes?
Yes, new function.