-
-
Notifications
You must be signed in to change notification settings - Fork 19k
API: mode.nan_is_na to consistently distinguish NaN-vs-NA #62040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
1d85ad8
to
1ccaaa4
Compare
f0e5e34
to
71d1c03
Compare
Discussed in the dev call before last where I, @mroeschke, and @Dr-Irv were +1. Joris was unenthused but "not necessarily opposed". On slack @rhshadrach expressed a +1. All those opinions were to the concept, not the execution. |
gentle ping @mroeschke |
arrays = [np.nan] * len(columns) | ||
if dtype is not None and not isinstance(dtype, np.dtype): | ||
# e.g. test_dataframe_from_dict_of_series | ||
arrays = [NA] * len(columns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we want the placeholder here to be nan
for StringDtype(na_value=nan)
, i.e.
if ... and isinstance(dtype, ExtensionDtype):
arrays = [dtype.na_value] * len(columns)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that'd probably be benign. would we expect pd.NA to ever not-work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah not sure if something like
df = pd.DataFrame({"a": ["b"]}, columns=["a", "b"], dtype=pd.StringDtype(na_value=np.nan))
df.loc[0, "b"]
Would correctly return nan here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it does
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use the suggested idiom
|
||
cf.register_option( | ||
"nan_is_na", | ||
os.environ.get("PANDAS_NAN_IS_NA", "1") == "1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious, I thought you were not fond of the environment variable pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that does sound like the kind of opinion i would have, but ATM i don't find myself bothered by it
Gentle ping |
Thanks @jbrockmendel |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.As discussed on the last dev call, this implements
"mode.nan_is_na"
(defaultTrue
) to consider NaN as either always-equivalent or never-equivalent to NA.This sits on top of
DataFrame.where
Still need to