API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

jbrockmendel · 2025-08-04T15:42:47Z

closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

As discussed on the last dev call, this implements "mode.nan_is_na" (default True) to consider NaN as either always-equivalent or never-equivalent to NA.

This sits on top of

TST: nan->NA in non-construction tests #62021, which trims the diff here by updating some tests to use NA instead of NaN.
API: consistent NaN treatment for pyarrow dtypes #61732 which implements the option but only for pyarrow dtypes.
API: improve dtype in df.where with EA other #62038 which addresses an issue in DataFrame.where
BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053 which addresses a kludge in read_csv with engine="pyarrow"

Still need to

Add docs for the new option, including whatsnew section
deal with a kludge in algorithms.rank; fixed by API: rank with nullable dtypes preserve NA #62043
deal with a kludge in read_csv with engine="pyarrow"; fixed by BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053
Add tests for the issues this addresses

…estamp type

jbrockmendel · 2025-08-13T22:06:09Z

Discussed in the dev call before last where I, @mroeschke, and @Dr-Irv were +1. Joris was unenthused but "not necessarily opposed". On slack @rhshadrach expressed a +1. All those opinions were to the concept, not the execution.

jbrockmendel · 2025-08-25T22:53:26Z

gentle ping @mroeschke

pandas/core/arrays/arrow/array.py

mroeschke · 2025-08-26T17:05:00Z

pandas/core/internals/construction.py

-        arrays = [np.nan] * len(columns)
+        if dtype is not None and not isinstance(dtype, np.dtype):
+            # e.g. test_dataframe_from_dict_of_series
+            arrays = [NA] * len(columns)


Would we want the placeholder here to be nan for StringDtype(na_value=nan), i.e.

if ... and isinstance(dtype, ExtensionDtype): arrays = [dtype.na_value] * len(columns)

that'd probably be benign. would we expect pd.NA to ever not-work?

Yeah not sure if something like

df = pd.DataFrame({"a": ["b"]}, columns=["a", "b"], dtype=pd.StringDtype(na_value=np.nan)) df.loc[0, "b"]

Would correctly return nan here

yes it does

Updated to use the suggested idiom

pandas/tests/extension/test_arrow.py

mroeschke · 2025-09-10T19:02:36Z

pandas/core/config_init.py


+    cf.register_option(
+        "nan_is_na",
+        os.environ.get("PANDAS_NAN_IS_NA", "1") == "1",


Curious, I thought you were not fond of the environment variable pattern?

that does sound like the kind of opinion i would have, but ATM i don't find myself bothered by it

jbrockmendel · 2025-09-26T00:24:01Z

Gentle ping

mroeschke · 2025-09-26T02:53:19Z

Thanks @jbrockmendel

jbrockmendel mentioned this pull request Aug 4, 2025

POC: NA-only behavior for numpy-nullable dtypes #61708

Closed

jbrockmendel force-pushed the api-nan-vs-na branch 2 times, most recently from 1d85ad8 to 1ccaaa4 Compare August 4, 2025 20:41

This was referenced Aug 5, 2025

API: NaN vs NA in mixed reduction #62024

Open

BUG: read_csv loses precision when engine='pyarrow' and dtype Int64 #56136

Closed

BUG: read_csv with engine=pyarrow and numpy-nullable dtype #62053

Merged

jbrockmendel force-pushed the api-nan-vs-na branch 3 times, most recently from f0e5e34 to 71d1c03 Compare August 6, 2025 14:45

jbrockmendel added 21 commits August 12, 2025 09:07

BUG: read_csv with engine=pyarrow and numpy-nullable dtype

5e88fde

mypy fixup, error message compat for 32bit builds

eae6f64

minimum version compat

2861b16

not-infer-string compat

5369afa

mypy fixup

db35a9c

update usage

505bfb6

CLN: remove redundant check

febe83c

Use Matts idea

c81cbec

re-xfail

26a3049

API: rank with nullable dtypes preserve NA

a70b429

API: improve dtype in df.where with EA other

99a71b7

GH refs

c86747d

doc fixup

9d222d8

BUG: Decimal(NaN) incorrectly allowed in ArrowEA constructor with tim…

6f800b3

…estamp type

GH ref

514a56f

BUG: ArrowEA constructor with timestamp type

fca3c7c

POC: consistent NaN treatment for pyarrow dtypes

f20758a

comment

cc416fa

Down to 40 failing tests

7094d85

Fix rank, json tests

eeb0d32

CLN: remove outdated

814d001

jbrockmendel mentioned this pull request Aug 19, 2025

ENH: EA._cast_pointwise_result #62105

Merged

5 tasks

jbrockmendel added 5 commits August 20, 2025 09:25

Merge branch 'main' into api-nan-vs-na

a625190

update _cast_pointwise_result

d471aa8

update cast_pointwise_result

27cd097

Merge branch 'main' into api-nan-vs-na

1bb0a4e

Merge branch 'main' into api-nan-vs-na

7cc3b41

mroeschke reviewed Aug 26, 2025

View reviewed changes

pandas/core/arrays/arrow/array.py Show resolved Hide resolved

mroeschke reviewed Aug 26, 2025

View reviewed changes

pandas/core/arrays/arrow/array.py Show resolved Hide resolved

mroeschke reviewed Aug 26, 2025

View reviewed changes

pandas/tests/extension/test_arrow.py Outdated Show resolved Hide resolved

jbrockmendel added 7 commits August 26, 2025 10:20

Merge branch 'main' into api-nan-vs-na

5f76e19

remove unnecessary import

b2a64bb

Merge branch 'main' into api-nan-vs-na

1024ac5

Merge branch 'main' into api-nan-vs-na

9d4a112

Merge branch 'main' into api-nan-vs-na

54ae640

Merge branch 'main' into api-nan-vs-na

7dcf2eb

NA->dtype.na_value

32a2041

mroeschke reviewed Sep 10, 2025

View reviewed changes

mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Sep 10, 2025

mroeschke approved these changes Sep 10, 2025

View reviewed changes

jbrockmendel added 3 commits September 15, 2025 14:47

Merge branch 'main' into api-nan-vs-na

3382678

Merge branch 'main' into api-nan-vs-na

d2473ab

Merge branch 'main' into api-nan-vs-na

b4dcfa6

mroeschke merged commit e4ca405 into pandas-dev:main Sep 26, 2025
42 checks passed

jbrockmendel deleted the api-nan-vs-na branch September 26, 2025 14:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

jbrockmendel commented Aug 4, 2025 •

edited

Loading

Uh oh!

jbrockmendel commented Aug 13, 2025

Uh oh!

jbrockmendel commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

mroeschke Aug 26, 2025

Uh oh!

jbrockmendel Aug 26, 2025

Uh oh!

mroeschke Aug 26, 2025 •

edited by jbrockmendel

Loading

Uh oh!

jbrockmendel Aug 26, 2025

Uh oh!

jbrockmendel Sep 9, 2025

Uh oh!

Uh oh!

mroeschke Sep 10, 2025

Uh oh!

jbrockmendel Sep 10, 2025

Uh oh!

jbrockmendel commented Sep 26, 2025

Uh oh!

Uh oh!

mroeschke commented Sep 26, 2025

Uh oh!

Uh oh!

Uh oh!

API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040

Conversation

jbrockmendel commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel commented Aug 13, 2025

Uh oh!

jbrockmendel commented Aug 25, 2025

Uh oh!

Uh oh!

Uh oh!

mroeschke Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

mroeschke Aug 26, 2025 • edited by jbrockmendel Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Aug 26, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Sep 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mroeschke Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

jbrockmendel commented Sep 26, 2025

Uh oh!

Uh oh!

mroeschke commented Sep 26, 2025

Uh oh!

Uh oh!

jbrockmendel commented Aug 4, 2025 •

edited

Loading

mroeschke Aug 26, 2025 •

edited by jbrockmendel

Loading