-
-
Notifications
You must be signed in to change notification settings - Fork 19.1k
ENH: EA._cast_pointwise_result #62105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: EA._cast_pointwise_result #62105
Conversation
The pyarrow duration stuff is caused by an upstream issue apache/arrow#40620 |
@rhshadrach i think you had a recent issue/pr involving retaining nullable dtypes in a .map? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really great improvement here. It seems to me there are two potential situations we might find ourselves in: (a) take the dtype of self
into account or (b) don't take the dtype of self
into account. I highlighted one specific example below where I think this might go awry. The base implementation is doing (b) whereas the subclasses are doing various degrees of (a). Understand that isn't being introduced here, but I think the long term goal is to make this more consistent?
For now, do we want to setup the framework to separate these two cases out somehow - perhaps an argument to _cast_pointwise_result
?
I see now that this is just preserving the dtype when possible. I think we don't need two different cases as I first imagined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Thanks @jbrockmendel |
@jbrockmendel is there a reason you removed the |
Because it is no longer used anywhere.
_from_scalars either returned the same dtype as the original or raised. _cast_pointwise_result does inference while attempting to retain the dtype_backend of* the** original. So e.g. with a timestamp[pyarrow] Index***, if you did * and itemsize where relevant |
That might happen for our own arrays, but I don't think the base class version of So as far as I can see, there is no way that the base implementation can ever work correctly for an external EA, which means they will always have to override this method. |
I see that in the issue you wrote:
Can you explain why you think that is the case? |
Because it only handled same-dtype casting, and even then required lots of overriding for cases when _from_sequence is more aggressive than we'd want. The actual method we need was dtype_backend-preserving inference.
Fair point. I'll take a look at how we can update the base class method to prevent geopandas from having to override. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Still have a bunch of pyarrow tests involving duration/timestamp dtypes failing. Also need to update/remove the test files' _cast_pointwise_result methods.
xref #56430, could close that with a little effort.
I suspect a bunch of "pyarrow dtype retention" tests are solved by this, will update as I check.Nope!