Skip to content

Commit 7dcf2eb

Browse files
committed
Merge branch 'main' into api-nan-vs-na
2 parents 54ae640 + 29ce489 commit 7dcf2eb

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+449
-128
lines changed

doc/source/user_guide/categorical.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ By passing a :class:`pandas.Categorical` object to a ``Series`` or assigning it
7777
.. ipython:: python
7878
7979
raw_cat = pd.Categorical(
80-
["a", "b", "c", "a"], categories=["b", "c", "d"], ordered=False
80+
[None, "b", "c", None], categories=["b", "c", "d"], ordered=False
8181
)
8282
s = pd.Series(raw_cat)
8383
s
@@ -145,7 +145,7 @@ of :class:`~pandas.api.types.CategoricalDtype`.
145145
146146
from pandas.api.types import CategoricalDtype
147147
148-
s = pd.Series(["a", "b", "c", "a"])
148+
s = pd.Series([None, "b", "c", None])
149149
cat_type = CategoricalDtype(categories=["b", "c", "d"], ordered=True)
150150
s_cat = s.astype(cat_type)
151151
s_cat

doc/source/user_guide/io.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -499,11 +499,14 @@ When using ``dtype=CategoricalDtype``, "unexpected" values outside of
499499
``dtype.categories`` are treated as missing values.
500500

501501
.. ipython:: python
502+
:okwarning:
502503
503504
dtype = CategoricalDtype(["a", "b", "d"]) # No 'c'
504505
pd.read_csv(StringIO(data), dtype={"col1": dtype}).col1
505506
506-
This matches the behavior of :meth:`Categorical.set_categories`.
507+
This matches the behavior of :meth:`Categorical.set_categories`. This behavior is
508+
deprecated. In a future version, the presence of non-NA values that are not
509+
among the specified categories will raise.
507510

508511
.. note::
509512

doc/source/whatsnew/v3.0.0.rst

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -696,6 +696,7 @@ Other Deprecations
696696
- Deprecated :meth:`Timestamp.utcfromtimestamp`, use ``Timestamp.fromtimestamp(ts, "UTC")`` instead (:issue:`56680`)
697697
- Deprecated :meth:`Timestamp.utcnow`, use ``Timestamp.now("UTC")`` instead (:issue:`56680`)
698698
- Deprecated ``pd.core.internals.api.maybe_infer_ndim`` (:issue:`40226`)
699+
- Deprecated allowing constructing or casting to :class:`Categorical` with non-NA values that are not present in specified ``dtype.categories`` (:issue:`40996`)
699700
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.all`, :meth:`DataFrame.min`, :meth:`DataFrame.max`, :meth:`DataFrame.sum`, :meth:`DataFrame.prod`, :meth:`DataFrame.mean`, :meth:`DataFrame.median`, :meth:`DataFrame.sem`, :meth:`DataFrame.var`, :meth:`DataFrame.std`, :meth:`DataFrame.skew`, :meth:`DataFrame.kurt`, :meth:`Series.all`, :meth:`Series.min`, :meth:`Series.max`, :meth:`Series.sum`, :meth:`Series.prod`, :meth:`Series.mean`, :meth:`Series.median`, :meth:`Series.sem`, :meth:`Series.var`, :meth:`Series.std`, :meth:`Series.skew`, and :meth:`Series.kurt`. (:issue:`57087`)
700701
- Deprecated allowing non-keyword arguments in :meth:`Series.to_markdown` except ``buf``. (:issue:`57280`)
701702
- Deprecated allowing non-keyword arguments in :meth:`Series.to_string` except ``buf``. (:issue:`57280`)
@@ -709,6 +710,7 @@ Other Deprecations
709710
- Deprecated the ``arg`` parameter of ``Series.map``; pass the added ``func`` argument instead. (:issue:`61260`)
710711
- Deprecated using ``epoch`` date format in :meth:`DataFrame.to_json` and :meth:`Series.to_json`, use ``iso`` instead. (:issue:`57063`)
711712
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.unstack` and :meth:`DataFrame.unstack` (:issue:`12189`, :issue:`53868`)
713+
- Deprecated allowing ``fill_value`` that cannot be held in the original dtype (excepting NA values for integer and bool dtypes) in :meth:`Series.shift` and :meth:`DataFrame.shift` (:issue:`53802`)
712714

713715
.. ---------------------------------------------------------------------------
714716
.. _whatsnew_300.prior_deprecations:
@@ -949,6 +951,7 @@ Datetimelike
949951
- Bug in :meth:`DatetimeIndex.union` and :meth:`DatetimeIndex.intersection` when ``unit`` was non-nanosecond (:issue:`59036`)
950952
- Bug in :meth:`Index.union` with a ``pyarrow`` timestamp dtype incorrectly returning ``object`` dtype (:issue:`58421`)
951953
- Bug in :meth:`Series.dt.microsecond` producing incorrect results for pyarrow backed :class:`Series`. (:issue:`59154`)
954+
- Bug in :meth:`Timestamp.replace` failing to update ``unit`` attribute when replacement introduces non-zero ``nanosecond`` or ``microsecond`` (:issue:`57749`)
952955
- Bug in :meth:`to_datetime` not respecting dayfirst if an uncommon date string was passed. (:issue:`58859`)
953956
- Bug in :meth:`to_datetime` on float array with missing values throwing ``FloatingPointError`` (:issue:`58419`)
954957
- Bug in :meth:`to_datetime` on float32 df with year, month, day etc. columns leads to precision issues and incorrect result. (:issue:`60506`)
@@ -1018,6 +1021,8 @@ Indexing
10181021
- Bug in reindexing of :class:`DataFrame` with :class:`PeriodDtype` columns in case of consolidated block (:issue:`60980`, :issue:`60273`)
10191022
- Bug in :meth:`DataFrame.loc.__getitem__` and :meth:`DataFrame.iloc.__getitem__` with a :class:`CategoricalDtype` column with integer categories raising when trying to index a row containing a ``NaN`` entry (:issue:`58954`)
10201023
- Bug in :meth:`Index.__getitem__` incorrectly raising with a 0-dim ``np.ndarray`` key (:issue:`55601`)
1024+
- Bug in indexing on a :class:`DatetimeIndex` with a ``timestamp[pyarrow]`` dtype or on a :class:`TimedeltaIndex` with a ``duration[pyarrow]`` dtype (:issue:`62277`)
1025+
-
10211026

10221027
Missing
10231028
^^^^^^^
@@ -1186,6 +1191,7 @@ Other
11861191
- Bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)
11871192
- Bug in :meth:`Series.dt` methods in :class:`ArrowDtype` that were returning incorrect values. (:issue:`57355`)
11881193
- Bug in :meth:`Series.isin` raising ``TypeError`` when series is large (>10**6) and ``values`` contains NA (:issue:`60678`)
1194+
- Bug in :meth:`Series.map` with a ``timestamp[pyarrow]`` dtype or ``duration[pyarrow]`` dtype incorrectly returning all-``NaN`` entries (:issue:`61231`)
11891195
- Bug in :meth:`Series.mode` where an exception was raised when taking the mode with nullable types with no null values in the series. (:issue:`58926`)
11901196
- Bug in :meth:`Series.rank` that doesn't preserve missing values for nullable integers when ``na_option='keep'``. (:issue:`56976`)
11911197
- Bug in :meth:`Series.replace` and :meth:`DataFrame.replace` throwing ``ValueError`` when ``regex=True`` and all NA values. (:issue:`60688`)
@@ -1198,8 +1204,10 @@ Other
11981204
- Bug in ``divmod`` and ``rdivmod`` with :class:`DataFrame`, :class:`Series`, and :class:`Index` with ``bool`` dtypes failing to raise, which was inconsistent with ``__floordiv__`` behavior (:issue:`46043`)
11991205
- Bug in printing a :class:`DataFrame` with a :class:`DataFrame` stored in :attr:`DataFrame.attrs` raised a ``ValueError`` (:issue:`60455`)
12001206
- Bug in printing a :class:`Series` with a :class:`DataFrame` stored in :attr:`Series.attrs` raised a ``ValueError`` (:issue:`60568`)
1207+
- Deprecated the keyword ``check_datetimelike_compat`` in :meth:`testing.assert_frame_equal` and :meth:`testing.assert_series_equal` (:issue:`55638`)
12011208
- Fixed bug where the :class:`DataFrame` constructor misclassified array-like objects with a ``.name`` attribute as :class:`Series` or :class:`Index` (:issue:`61443`)
12021209
- Fixed regression in :meth:`DataFrame.from_records` not initializing subclasses properly (:issue:`57008`)
1210+
-
12031211

12041212
.. ***DO NOT USE THIS SECTION***
12051213

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3357,6 +3357,7 @@ default 'raise'
33573357
datetime ts_input
33583358
tzinfo_type tzobj
33593359
_TSObject ts
3360+
NPY_DATETIMEUNIT creso = self._creso
33603361
33613362
# set to naive if needed
33623363
tzobj = self.tzinfo
@@ -3396,8 +3397,12 @@ default 'raise'
33963397
dts.sec = validate("second", second)
33973398
if microsecond is not None:
33983399
dts.us = validate("microsecond", microsecond)
3400+
if creso < NPY_DATETIMEUNIT.NPY_FR_us:
3401+
# GH#57749
3402+
creso = NPY_DATETIMEUNIT.NPY_FR_us
33993403
if nanosecond is not None:
34003404
dts.ps = validate("nanosecond", nanosecond) * 1000
3405+
creso = NPY_FR_ns # GH#57749
34013406
if tzinfo is not object:
34023407
tzobj = tzinfo
34033408
@@ -3407,17 +3412,17 @@ default 'raise'
34073412
# to datetimes outside of pydatetime range.
34083413
ts = _TSObject()
34093414
try:
3410-
ts.value = npy_datetimestruct_to_datetime(self._creso, &dts)
3415+
ts.value = npy_datetimestruct_to_datetime(creso, &dts)
34113416
except OverflowError as err:
34123417
fmt = dts_to_iso_string(&dts)
34133418
raise OutOfBoundsDatetime(
34143419
f"Out of bounds timestamp: {fmt} with frequency '{self.unit}'"
34153420
) from err
34163421
ts.dts = dts
3417-
ts.creso = self._creso
3422+
ts.creso = creso
34183423
ts.fold = fold
34193424
return create_timestamp_from_ts(
3420-
ts.value, dts, tzobj, fold, reso=self._creso
3425+
ts.value, dts, tzobj, fold, reso=creso
34213426
)
34223427
34233428
elif tzobj is not None and treat_tz_as_pytz(tzobj):
@@ -3436,10 +3441,10 @@ default 'raise'
34363441
ts_input = datetime(**kwargs)
34373442
34383443
ts = convert_datetime_to_tsobject(
3439-
ts_input, tzobj, nanos=dts.ps // 1000, reso=self._creso
3444+
ts_input, tzobj, nanos=dts.ps // 1000, reso=creso
34403445
)
34413446
return create_timestamp_from_ts(
3442-
ts.value, dts, tzobj, fold, reso=self._creso
3447+
ts.value, dts, tzobj, fold, reso=creso
34433448
)
34443449
34453450
def to_julian_date(self) -> np.float64:

pandas/_testing/asserters.py

Lines changed: 33 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
NoReturn,
88
cast,
99
)
10+
import warnings
1011

1112
import numpy as np
1213

@@ -15,6 +16,8 @@
1516
from pandas._libs.sparse import SparseIndex
1617
import pandas._libs.testing as _testing
1718
from pandas._libs.tslibs.np_datetime import compare_mismatched_resolutions
19+
from pandas.errors import Pandas4Warning
20+
from pandas.util._decorators import deprecate_kwarg
1821

1922
from pandas.core.dtypes.common import (
2023
is_bool,
@@ -843,6 +846,7 @@ def assert_extension_array_equal(
843846

844847

845848
# This could be refactored to use the NDFrame.equals method
849+
@deprecate_kwarg(Pandas4Warning, "check_datetimelike_compat", new_arg_name=None)
846850
def assert_series_equal(
847851
left,
848852
right,
@@ -897,6 +901,9 @@ def assert_series_equal(
897901
898902
check_datetimelike_compat : bool, default False
899903
Compare datetime-like which is comparable ignoring dtype.
904+
905+
.. deprecated:: 3.0
906+
900907
check_categorical : bool, default True
901908
Whether to compare internal Categorical exactly.
902909
check_category_order : bool, default True
@@ -1132,6 +1139,7 @@ def assert_series_equal(
11321139

11331140

11341141
# This could be refactored to use the NDFrame.equals method
1142+
@deprecate_kwarg(Pandas4Warning, "check_datetimelike_compat", new_arg_name=None)
11351143
def assert_frame_equal(
11361144
left,
11371145
right,
@@ -1194,6 +1202,9 @@ def assert_frame_equal(
11941202
``check_exact``, ``rtol`` and ``atol`` are specified.
11951203
check_datetimelike_compat : bool, default False
11961204
Compare datetime-like which is comparable ignoring dtype.
1205+
1206+
.. deprecated:: 3.0
1207+
11971208
check_categorical : bool, default True
11981209
Whether to compare internal Categorical exactly.
11991210
check_like : bool, default False
@@ -1320,22 +1331,28 @@ def assert_frame_equal(
13201331
# use check_index=False, because we do not want to run
13211332
# assert_index_equal for each column,
13221333
# as we already checked it for the whole dataframe before.
1323-
assert_series_equal(
1324-
lcol,
1325-
rcol,
1326-
check_dtype=check_dtype,
1327-
check_index_type=check_index_type,
1328-
check_exact=check_exact,
1329-
check_names=check_names,
1330-
check_datetimelike_compat=check_datetimelike_compat,
1331-
check_categorical=check_categorical,
1332-
check_freq=check_freq,
1333-
obj=f'{obj}.iloc[:, {i}] (column name="{col}")',
1334-
rtol=rtol,
1335-
atol=atol,
1336-
check_index=False,
1337-
check_flags=False,
1338-
)
1334+
with warnings.catch_warnings():
1335+
warnings.filterwarnings(
1336+
"ignore",
1337+
message="the 'check_datetimelike_compat' keyword",
1338+
category=Pandas4Warning,
1339+
)
1340+
assert_series_equal(
1341+
lcol,
1342+
rcol,
1343+
check_dtype=check_dtype,
1344+
check_index_type=check_index_type,
1345+
check_exact=check_exact,
1346+
check_names=check_names,
1347+
check_datetimelike_compat=check_datetimelike_compat,
1348+
check_categorical=check_categorical,
1349+
check_freq=check_freq,
1350+
obj=f'{obj}.iloc[:, {i}] (column name="{col}")',
1351+
rtol=rtol,
1352+
atol=atol,
1353+
check_index=False,
1354+
check_flags=False,
1355+
)
13391356

13401357

13411358
def assert_equal(left, right, **kwargs) -> None:

pandas/core/arrays/arrow/array.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1679,6 +1679,10 @@ def map(self, mapper, na_action: Literal["ignore"] | None = None):
16791679
if is_numeric_dtype(self.dtype):
16801680
return map_array(self.to_numpy(), mapper, na_action=na_action)
16811681
else:
1682+
# For "mM" cases, the super() method passes `self` without the
1683+
# to_numpy call, which inside map_array casts to ndarray[object].
1684+
# Without the to_numpy() call, NA is preserved instead of changed
1685+
# to None.
16821686
return super().map(mapper, na_action)
16831687

16841688
@doc(ExtensionArray.duplicated)

0 commit comments

Comments
 (0)