Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python][Interchange protocol] Export boolean columns as bit-packed values #37991

Open
AlenkaF opened this issue Oct 3, 2023 · 1 comment
Open

Comments

@AlenkaF
Copy link
Member

AlenkaF commented Oct 3, 2023

Describe the enhancement requested

Currently we export boolean data as unit8 in the DataFrame Interchange Protocol implementation. The reason for it is that other libraries did not support bit-packed boolean columns when consuming interchange objects at that time. Also it was not clear in the spec if bit- and byte-packed booleans are supported. See data-apis/dataframe-api#227.

For this reason we could start exporting Arrow booleans as bit-packed in our implementation of the protocol. I think this should not cause any breaking change, but can be wrong.

With this we will also be able to test the support of bit-packed booleans in the from_dataframe, see #37975 (comment).

cc @jorisvandenbossche

Component(s)

Python

@stinodego
Copy link

stinodego commented Jan 2, 2024

@AlenkaF I would support exporting booleans as bit-packed booleans. It is more in line with the protocol: zero copy if possible. It is up to the person implementing from_dataframe to handle reading bit-packed booleans. That is how I have implemented it in Polars.

I now have a special case for pyarrow's 8-bit boolean type, but I'd like to get rid of it if at all possible, as it doesn't make much sense, in my opinion.

Alternatively, the dtype should have the C type rather than b to indicate that the data buffer is UInt8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants