-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix binary operations on attrs for Series and DataFrame #59636
Conversation
fbourgey
commented
Aug 28, 2024
- closes BUG: binary operations don't propogate attrs depending on order with Series and/or DataFrame/Series #51607
- Test
- Test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small change to prefer fixtures to writing out our own binop implementations, but generally lgtm. I don't think current CI failures are related.
@mroeschke any thoughts here?
pandas/tests/frame/test_api.py
Outdated
df_2 = DataFrame({"A": [-3, 9]}) | ||
attrs = {"info": "DataFrame"} | ||
df_1.attrs = attrs | ||
assert (df_1 + df_2).attrs == attrs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than doing this you can just use the all_binary_operators
fixture from conftest.py (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think attrs propagation logic should should only be handled by __finalize__
, so these binary operations should dispatch to that method
@mroeschke should everything be rewritten using |
Yes, or |
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
@mroeschke, @WillAyd, I tried using |
I think it looks good but will defer to @mroeschke |
pandas/core/frame.py
Outdated
@@ -7875,13 +7875,19 @@ class diet | |||
def _cmp_method(self, other, op): | |||
axis: Literal[1] = 1 # only relevant for Series other case | |||
|
|||
if not getattr(self, "attrs", None) and getattr(other, "attrs", None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should need these anymore here since this should be handled in _construct_result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that sometimes
self, other = self._align_for_op(other, axis, flex=False, level=None)
resets other.attrs
to {}
.
This is why I kept it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it because other
is getting overridden here? Otherwise, _align_for_op
should also preserve the attrs
of other.
@@ -8212,6 +8208,9 @@ def to_series(right): | |||
) | |||
right = left._maybe_align_series_as_frame(right, axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this resets the attrs
of right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider that a bug. attrs
should be preserved in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I fix it in this PR or raise a different issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can fix it in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested something below
pandas/core/frame.py
Outdated
@@ -8283,6 +8285,8 @@ def _construct_result(self, result) -> DataFrame: | |||
------- | |||
DataFrame | |||
""" | |||
if not getattr(self, "attrs", None) and getattr(other, "attrs", None): | |||
self.__finalize__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do out = out.__finalize(other)
instead?
pandas/core/frame.py
Outdated
|
||
def _construct_result(self, result) -> DataFrame: | ||
def _construct_result(self, result, other=None) -> DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _construct_result(self, result, other=None) -> DataFrame: | |
def _construct_result(self, result, other) -> DataFrame: |
Might as well make this required
pandas/core/frame.py
Outdated
if not getattr(self, "attrs", None) and getattr(other, "attrs", None): | ||
out = out.__finalize__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not getattr(self, "attrs", None) and getattr(other, "attrs", None): | |
out = out.__finalize__(other) | |
out = out.__finalize__(other) |
Appears __finalize__
will correctly skip if other
has a populated attrs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this breaks the following test:
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-True-add] - AssertionError
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-True-sub] - AssertionError
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line __finalize__
needs a fix:
self.flags.allows_duplicate_labels = other.flags.allows_duplicate_labels
Prioritizing False
if self.flags.allows_duplicate_labels
or other.flags.allows_duplicate_labels
is False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about doing in __finalize__
if isinstance(other, NDFrame):
if other.attrs:
# We want attrs propagation to have minimal performance
# impact if attrs are not used; i.e. attrs is an empty dict.
# One could make the deepcopy unconditionally, but a deepcopy
# of an empty dict is 50x more expensive than the empty check.
self.attrs = deepcopy(other.attrs)
self.flags.allows_duplicate_labels = (
self.flags.allows_duplicate_labels
and other.flags.allows_duplicate_labels
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup that's the correct location to fix this
pandas/core/indexes/base.py
Outdated
|
||
@final | ||
def _construct_result(self, result, name): | ||
def _construct_result(self, result, name, other=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _construct_result(self, result, name, other=None): | |
def _construct_result(self, result, name, other): |
pandas/core/series.py
Outdated
self, | ||
result: ArrayLike | tuple[ArrayLike, ArrayLike], | ||
name: Hashable, | ||
other: AnyArrayLike | DataFrame | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other: AnyArrayLike | DataFrame | None = None, | |
other: AnyArrayLike | DataFrame, |
pandas/core/series.py
Outdated
@@ -5943,6 +5947,7 @@ def _construct_result( | |||
---------- | |||
result : ndarray or ExtensionArray | |||
name : Label | |||
other : Series, DataFrame or array-like, default None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other : Series, DataFrame or array-like, default None | |
other : Series, DataFrame or array-like |
pandas/core/series.py
Outdated
if getattr(other, "attrs", None): | ||
out.__finalize__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if getattr(other, "attrs", None): | |
out.__finalize__(other) | |
out = out.__finalize__(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing this breaks:
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-False-add] - AssertionError
FAILED pandas/tests/generic/test_duplicate_labels.py::TestPreserves::test_binops[other1-False-sub] - AssertionError
something to do with flags.allows_duplicate_labels
pandas/core/base.py
Outdated
|
||
def _construct_result(self, result, name): | ||
def _construct_result(self, result, name, other=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _construct_result(self, result, name, other=None): | |
def _construct_result(self, result, name, other): |
pandas/core/frame.py
Outdated
@@ -8101,6 +8100,7 @@ def _align_for_op( | |||
left : DataFrame | |||
right : Any | |||
""" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/core/frame.py
Outdated
@@ -8200,15 +8200,13 @@ def to_series(right): | |||
"`left, right = left.align(right, axis=1)` " | |||
"before operating." | |||
) | |||
|
|||
left, right = left.align( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left, right = left.align( | |
left, right = left.align( |
Thanks for sticking with this @fbourgey! |
Thanks for the help @mroeschke! |