-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-47602: [Python] Make Schema hashable even when it has metadata #47601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-47602: [Python] Make Schema hashable even when it has metadata #47601
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing the bug and adding the test for it!
The change looks good to me. I only have one minor nit otherwise the PR is ready to go.
Pinging @raulcd for one extra review.
|
@github-actions crossbow submit -g python |
Co-authored-by: Alenka Frim <[email protected]>
|
Revision: ad52c12 Submitted crossbow builds: ursacomputing/crossbow @ actions-6cf16e3659 |
|
The crossbow stuff for some reason coredumped for ubuntu-22.04 (https://github.com/ursacomputing/crossbow/actions/runs/17939291515/job/51011788800, https://github.com/ursacomputing/crossbow/actions/runs/17939291336/job/51011788225). But I wouldn't see how that is connected to this change? 🤔 |
|
These three failures are not connected. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Sorry for taking a little to review, I've been busy with other issues.
Thanks for the ping
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit ef718a7. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…ta (apache#47601) ### Rationale for this change In Python, `pyarrow.Schema` before was not hashable when it has `metadata` set. ``` >>> import pyarrow >>> schema = pyarrow.schema([], metadata={b"1": b"1"}) >>> hash(schema) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/types.pxi", line 2921, in pyarrow.lib.Schema.__hash__ TypeError: unhashable type: 'dict' ``` This is because the metadata (which is a dict) was tried to be hashed as-is, which doesn't work. ### What changes are included in this PR? Slightly change how hashes are computed for Schema, by converting the `dict[str, str]` to the frozenset of key- and value tuples. For reference, this is faster than computing the hash of a sorted tuple of key- and value tuples (https://stackoverflow.com/a/6014481/10070873). ### Are these changes tested? Yes. ### Are there any user-facing changes? Besides that `Schema` now correctly is hashable, no. * GitHub Issue: apache#47602 Lead-authored-by: Jonas Dedden <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
Rationale for this change
In Python,
pyarrow.Schemabefore was not hashable when it hasmetadataset.This is because the metadata (which is a dict) was tried to be hashed as-is, which doesn't work.
What changes are included in this PR?
Slightly change how hashes are computed for Schema, by converting the
dict[str, str]to the frozenset of key- and value tuples.For reference, this is faster than computing the hash of a sorted tuple of key- and value tuples (https://stackoverflow.com/a/6014481/10070873).
Are these changes tested?
Yes.
Are there any user-facing changes?
Besides that
Schemanow correctly is hashable, no.