Skip to content

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Aug 14, 2025

Still have a bunch of pyarrow tests involving duration/timestamp dtypes failing. Also need to update/remove the test files' _cast_pointwise_result methods.

xref #56430, could close that with a little effort.

I suspect a bunch of "pyarrow dtype retention" tests are solved by this, will update as I check. Nope!

@jbrockmendel jbrockmendel changed the title ENH: EA._cast_pointwise_result WIP/ENH: EA._cast_pointwise_result Aug 14, 2025
@jbrockmendel
Copy link
Member Author

The pyarrow duration stuff is caused by an upstream issue apache/arrow#40620

@jbrockmendel jbrockmendel changed the title WIP/ENH: EA._cast_pointwise_result ENH: EA._cast_pointwise_result Aug 15, 2025
@jbrockmendel
Copy link
Member Author

@rhshadrach i think you had a recent issue/pr involving retaining nullable dtypes in a .map?

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great improvement here. It seems to me there are two potential situations we might find ourselves in: (a) take the dtype of self into account or (b) don't take the dtype of self into account. I highlighted one specific example below where I think this might go awry. The base implementation is doing (b) whereas the subclasses are doing various degrees of (a). Understand that isn't being introduced here, but I think the long term goal is to make this more consistent?

For now, do we want to setup the framework to separate these two cases out somehow - perhaps an argument to _cast_pointwise_result?

I see now that this is just preserving the dtype when possible. I think we don't need two different cases as I first imagined.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mroeschke mroeschke added this to the 3.0 milestone Aug 20, 2025
@mroeschke mroeschke added the ExtensionArray Extending pandas with custom dtypes or arrays. label Aug 20, 2025
@mroeschke mroeschke merged commit cb7b334 into pandas-dev:main Aug 20, 2025
41 checks passed
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jorisvandenbossche
Copy link
Member

@jbrockmendel is there a reason you removed the _from_scalars EA interface method (which I think was added after quite some discussions, #33254 / #38315 / #53089). Was it no longer used / needed?
And what's the difference exactly with _cast_pointwise_result? That one is not strict? (the base EA implementation of it does not even return an EA, so not entirely sure I understand the purpose of that base method, and it is not being documented as somethin to override for external EA authors?)

@jbrockmendel
Copy link
Member Author

is there a reason you removed the _from_scalars

Because it is no longer used anywhere.

And what's the difference exactly with _cast_pointwise_result? That one is not strict?

_from_scalars either returned the same dtype as the original or raised. _cast_pointwise_result does inference while attempting to retain the dtype_backend of* the** original.

So e.g. with a timestamp[pyarrow] Index***, if you did idx.map(lambda x: x - pd.Timestamp(0)), if Index.map uses _from_scalars, it would try to cast back to timestamp[pyarrow], fail and raise. Then we'd fall back to lib.maybe_convert_objects and get a non-pyarrow timedelta64. With _cast_pointwise_result, we get duration[pyarrow], which is what we generally want in these cases.

* and itemsize where relevant
** for this purpose "dtype_backend" is a little fuzzy about whether it includes categorical-ness or sparse-ness.
*** don't hold me to this exact example, as im not sure off the top of my head if we yet use _cast_pointwise_result consistently

@jorisvandenbossche
Copy link
Member

_from_scalars either returned the same dtype as the original or raised. _cast_pointwise_result does inference while attempting to retain the dtype_backend of* the** original.

That might happen for our own arrays, but I don't think the base class version of _cast_pointwise_result is trying to do any inference with attempt to retain the dtype?
(it just calls maybe_convert_objects, which will typically return a numpy array, or one of our period/datetime/timedelta types)

So as far as I can see, there is no way that the base implementation can ever work correctly for an external EA, which means they will always have to override this method.
While the current implementation of maybe_cast_pointwise_result using _from_scalars can work fine for external EAs.

@jorisvandenbossche
Copy link
Member

I see that in the issue you wrote:

This will effectively replace _from_scalars, which was a mis-feature.

Can you explain why you think that is the case?

@jbrockmendel
Copy link
Member Author

Can you explain why you think that is the case?

Because it only handled same-dtype casting, and even then required lots of overriding for cases when _from_sequence is more aggressive than we'd want. The actual method we need was dtype_backend-preserving inference.

So as far as I can see, there is no way that the base implementation can ever work correctly for an external EA, which means they will always have to override this method.

Fair point. I'll take a look at how we can update the base class method to prevent geopandas from having to override.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

REF: make _cast_pointwise_result an EA method
4 participants