Skip to content

Commit ef718a7

Browse files
jonded94AlenkaF
andauthored
GH-47602: [Python] Make Schema hashable even when it has metadata (#47601)
### Rationale for this change In Python, `pyarrow.Schema` before was not hashable when it has `metadata` set. ``` >>> import pyarrow >>> schema = pyarrow.schema([], metadata={b"1": b"1"}) >>> hash(schema) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "pyarrow/types.pxi", line 2921, in pyarrow.lib.Schema.__hash__ TypeError: unhashable type: 'dict' ``` This is because the metadata (which is a dict) was tried to be hashed as-is, which doesn't work. ### What changes are included in this PR? Slightly change how hashes are computed for Schema, by converting the `dict[str, str]` to the frozenset of key- and value tuples. For reference, this is faster than computing the hash of a sorted tuple of key- and value tuples (https://stackoverflow.com/a/6014481/10070873). ### Are these changes tested? Yes. ### Are there any user-facing changes? Besides that `Schema` now correctly is hashable, no. * GitHub Issue: #47602 Lead-authored-by: Jonas Dedden <[email protected]> Co-authored-by: Alenka Frim <[email protected]> Signed-off-by: Raúl Cumplido <[email protected]>
1 parent cde3f6a commit ef718a7

File tree

2 files changed

+24
-1
lines changed

2 files changed

+24
-1
lines changed

python/pyarrow/tests/test_schema.py

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -482,6 +482,28 @@ def test_schema_set_field():
482482
assert s3.field(0).nullable is False
483483

484484

485+
def test_schema_hash_metadata():
486+
fields = [
487+
pa.field("foo", pa.int32()),
488+
]
489+
490+
schema1 = pa.schema(fields, metadata={b'foo': b'bar'})
491+
schema2 = pa.schema(fields, metadata={b'foo': b'bar'})
492+
schema3 = pa.schema(fields, metadata={b'foo_different': b'bar'})
493+
schema4 = pa.schema(fields, metadata={b'foo': b'bar_different'})
494+
495+
assert hash(schema1) == hash(schema2)
496+
assert hash(schema1) != hash(schema3)
497+
assert hash(schema1) != hash(schema4)
498+
assert hash(schema3) != hash(schema4)
499+
500+
schema_empty1 = pa.schema(fields, metadata={})
501+
schema_empty2 = pa.schema(fields, metadata=None)
502+
503+
assert hash(schema_empty1) == hash(schema_empty2)
504+
assert hash(schema_empty1) != hash(schema1)
505+
506+
485507
def test_schema_equals():
486508
fields = [
487509
pa.field('foo', pa.int32()),

python/pyarrow/types.pxi

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2918,7 +2918,8 @@ cdef class Schema(_Weakrefable):
29182918
return schema, (list(self), self.metadata)
29192919

29202920
def __hash__(self):
2921-
return hash((tuple(self), self.metadata))
2921+
metadata = frozenset(self.metadata.items() if self.metadata else {})
2922+
return hash((tuple(self), metadata))
29222923

29232924
def __sizeof__(self):
29242925
size = 0

0 commit comments

Comments
 (0)