Skip to content

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Aug 4, 2025

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

As discussed on the last dev call, this implements "mode.nan_is_na" (default True) to consider NaN as either always-equivalent or never-equivalent to NA.

This sits on top of

Still need to

@jbrockmendel
Copy link
Member Author

Discussed in the dev call before last where I, @mroeschke, and @Dr-Irv were +1. Joris was unenthused but "not necessarily opposed". On slack @rhshadrach expressed a +1. All those opinions were to the concept, not the execution.

@jbrockmendel
Copy link
Member Author

gentle ping @mroeschke

arrays = [np.nan] * len(columns)
if dtype is not None and not isinstance(dtype, np.dtype):
# e.g. test_dataframe_from_dict_of_series
arrays = [NA] * len(columns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want the placeholder here to be nan for StringDtype(na_value=nan), i.e.

if ... and isinstance(dtype, ExtensionDtype):
    arrays = [dtype.na_value] * len(columns)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that'd probably be benign. would we expect pd.NA to ever not-work?

Copy link
Member

@mroeschke mroeschke Aug 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah not sure if something like

df = pd.DataFrame({"a": ["b"]}, columns=["a", "b"], dtype=pd.StringDtype(na_value=np.nan))
df.loc[0, "b"]

Would correctly return nan here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it does

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to use the suggested idiom


cf.register_option(
"nan_is_na",
os.environ.get("PANDAS_NAN_IS_NA", "1") == "1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, I thought you were not fond of the environment variable pattern?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that does sound like the kind of opinion i would have, but ATM i don't find myself bothered by it

@mroeschke mroeschke added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Sep 10, 2025
@jbrockmendel
Copy link
Member Author

Gentle ping

@mroeschke mroeschke merged commit e4ca405 into pandas-dev:main Sep 26, 2025
42 checks passed
@mroeschke
Copy link
Member

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the api-nan-vs-na branch September 26, 2025 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants