Skip to content

Commit e2e3791

Browse files
Fix error value_counts result with pyarrow categorical columns (#60949)
* BUG: Fix PyArrow array access in Categorical constructor for Index objects (#60563) * TST: Add test for value_counts with Arrow dictionary dtype (#60563) * DOC: Add changelog entry for PyArrow array access fix in Categorical (#60563) * Update doc/source/whatsnew/v3.0.0.rst Co-authored-by: Matthew Roeschke <[email protected]> --------- Co-authored-by: Matthew Roeschke <[email protected]>
1 parent 4c3b573 commit e2e3791

File tree

3 files changed

+24
-1
lines changed

3 files changed

+24
-1
lines changed

doc/source/whatsnew/v3.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -787,6 +787,7 @@ Sparse
787787

788788
ExtensionArray
789789
^^^^^^^^^^^^^^
790+
- Bug in :class:`Categorical` when constructing with an :class:`Index` with :class:`ArrowDtype` (:issue:`60563`)
790791
- Bug in :meth:`.arrays.ArrowExtensionArray.__setitem__` which caused wrong behavior when using an integer array with repeated values as a key (:issue:`58530`)
791792
- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
792793
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)

pandas/core/arrays/categorical.py

+6-1
Original file line numberDiff line numberDiff line change
@@ -447,7 +447,12 @@ def __init__(
447447
if isinstance(values.dtype, ArrowDtype) and issubclass(
448448
values.dtype.type, CategoricalDtypeType
449449
):
450-
arr = values._pa_array.combine_chunks()
450+
from pandas import Index
451+
452+
if isinstance(values, Index):
453+
arr = values._data._pa_array.combine_chunks()
454+
else:
455+
arr = values._pa_array.combine_chunks()
451456
categories = arr.dictionary.to_pandas(types_mapper=ArrowDtype)
452457
codes = arr.indices.to_numpy()
453458
dtype = CategoricalDtype(categories, values.dtype.pyarrow_dtype.ordered)

pandas/tests/extension/test_arrow.py

+17
Original file line numberDiff line numberDiff line change
@@ -3511,3 +3511,20 @@ def test_map_numeric_na_action():
35113511
result = ser.map(lambda x: 42, na_action="ignore")
35123512
expected = pd.Series([42.0, 42.0, np.nan], dtype="float64")
35133513
tm.assert_series_equal(result, expected)
3514+
3515+
3516+
def test_categorical_from_arrow_dictionary():
3517+
# GH 60563
3518+
df = pd.DataFrame(
3519+
{"A": ["a1", "a2"]}, dtype=ArrowDtype(pa.dictionary(pa.int32(), pa.utf8()))
3520+
)
3521+
result = df.value_counts(dropna=False)
3522+
expected = pd.Series(
3523+
[1, 1],
3524+
index=pd.MultiIndex.from_arrays(
3525+
[pd.Index(["a1", "a2"], dtype=ArrowDtype(pa.string()), name="A")]
3526+
),
3527+
name="count",
3528+
dtype="int64",
3529+
)
3530+
tm.assert_series_equal(result, expected)

0 commit comments

Comments
 (0)