You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently we export boolean data as unit8 in the DataFrame Interchange Protocol implementation. The reason for it is that other libraries did not support bit-packed boolean columns when consuming interchange objects at that time. Also it was not clear in the spec if bit- and byte-packed booleans are supported. See data-apis/dataframe-api#227.
For this reason we could start exporting Arrow booleans as bit-packed in our implementation of the protocol. I think this should not cause any breaking change, but can be wrong.
With this we will also be able to test the support of bit-packed booleans in the from_dataframe, see #37975 (comment).
@AlenkaF I would support exporting booleans as bit-packed booleans. It is more in line with the protocol: zero copy if possible. It is up to the person implementing from_dataframe to handle reading bit-packed booleans. That is how I have implemented it in Polars.
I now have a special case for pyarrow's 8-bit boolean type, but I'd like to get rid of it if at all possible, as it doesn't make much sense, in my opinion.
Alternatively, the dtype should have the C type rather than b to indicate that the data buffer is UInt8.
Describe the enhancement requested
Currently we export boolean data as
unit8
in the DataFrame Interchange Protocol implementation. The reason for it is that other libraries did not support bit-packed boolean columns when consuming interchange objects at that time. Also it was not clear in the spec if bit- and byte-packed booleans are supported. See data-apis/dataframe-api#227.For this reason we could start exporting Arrow booleans as bit-packed in our implementation of the protocol. I think this should not cause any breaking change, but can be wrong.
With this we will also be able to test the support of bit-packed booleans in the
from_dataframe
, see #37975 (comment).cc @jorisvandenbossche
Component(s)
Python
The text was updated successfully, but these errors were encountered: