- 
                Notifications
    You must be signed in to change notification settings 
- Fork 247
fix : implement_try_eval_mode_arithmetic #2073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| Hello @andygrove , I Implemented custom arrow kernels to perform checked_add , checked_sub and checked_mul (registered as UDFS) supporting Integral types only (similar to spark's behavior) . My hope is to repeat / reuse this for other ops (in future), now that there is a framework established. Please take a look whenever you get a chance and I can make changes (if any) to support try eval mode. Thank you very much | 
| Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@             Coverage Diff              @@
##               main    #2073      +/-   ##
============================================
+ Coverage     56.12%   58.45%   +2.33%     
- Complexity      976     1253     +277     
============================================
  Files           119      143      +24     
  Lines         11743    13192    +1449     
  Branches       2251     2370     +119     
============================================
+ Hits           6591     7712    +1121     
- Misses         4012     4256     +244     
- Partials       1140     1224      +84     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
2f735f2    to
    e539695      
    Compare
  
    | @andygrove  , 
 | 
| @coderfender looks like there are some clippy issues to be resolved | 
        
          
                native/core/src/execution/planner.rs
              
                Outdated
          
        
      | // TODO respect ANSI eval mode | ||
| // https://github.com/apache/datafusion-comet/issues/2021 | ||
| // https://github.com/apache/datafusion-comet/issues/536 | ||
| let _eval_mode = from_protobuf_eval_mode(expr.eval_mode)?; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the leading _ from the variable name now that we are using the variable
| match op { | ||
| "checked_add" => builder.append_option(l.add_checked(r).ok()), | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performing this match operation on every row will be expensive. It would be better to invert this and to the match once and then have a different for loop for each operation, something like this:
match op {
  "checked_add" =>
    for i in 0..len {
      ...
    }
  "checked_sub" =>
    for i in 0..len {
      ...
    }    | @coderfender I took a first pass through this, and I think this is looking good 👍 | 
        
          
                native/core/src/execution/planner.rs
              
                Outdated
          
        
      | return_type: Option<&spark_expression::DataType>, | ||
| op: DataFusionOperator, | ||
| input_schema: SchemaRef, | ||
| fail_on_overflow: bool, | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe consider passing in the eval mode here instead of a boolean? We'll eventually need to support all three modes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thats a great idea!
| One of the TPC-H check failed with a network exception. @andygrove could you please re trigger that workflow whenever you get a chance? | 
| @andygrove , thank you for restarting the failed job and glad to see that the checks have all passed. Please review once you get a chance and let me know if you think we need further changes. Thank you | 
| (l.to_array_of_size(r.len())?, Arc::clone(r)) | ||
| } | ||
| (ColumnarValue::Array(l), ColumnarValue::Scalar(r)) => { | ||
| (Arc::clone(l), r.to_array_of_size(l.len())?) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may eventually want to have a specialized version of the kernel for the scalar case to avoid the overhead of creating an array from the scalar. This does not need to happen as part of this PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure ! I will create a follow up enhancement to track changes for a scalar impl. Thank you for the feed back @andygrove .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @coderfender
| @coderfender could you fix the conflicts? | 
| Thank you very much for the approval @andygrove . I am going to push a fix shortly after fixing conflicts. | 
| @andygrove , There is a test failure with the below error after rebase with main branch. I am currently investigating the failure and patch a potential fix  | 
cfba67a    to
    bd002fa      
    Compare
  
    | @andygrove the checks have all passed. Thank you for your approval .Please merge once you get a chance | 
Which issue does this PR close?
Closes #2021
Rationale for this change
PR to support Try eval mode in native . Unfortunately neither DataFusion not Arrow crates support null on overflow ops which is the desired outcome in Spark (when Eval mode is set to Try)
What changes are included in this PR?
checked_arithmeticwhich would perform checked_add / checked_subtract/ checked_multiply over operands and wraps overflow in a NULL . (There aren't direct DataFusion options / Arrow Kernel APIs which would provide this functionality out of the box and hence the need for custom kernel + UDF based solution)How are these changes tested?