Skip to content

Conversation

@d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Oct 28, 2025

This PR adds a Bytes dtype that is nearly identical to the existing VariableLengthBytes dtype save a few differences:

  • the V3 JSON form is "bytes" instead of "variable_length_bytes"
  • The fill value representation is an array of ints, instead of a base64-encoded string
  • Bytes is consistent with a published spec, instead of not being described by a spec.

The latter point is the most important.

Because Bytes is nearly identical to VariableLengthBytes, it could be a drop-in replacement for VariableLengthBytes, save for the JSON fill value encoding differences between the two codecs, which raises two questions:

  • is base64 encoding a bytestring a better or worse JSON serialization than representing the same bytestring as an array of integers?
  • could (or should) we amend the bytes data type spec to recommend reading (or reading and writing) both fill value encodings? If so, the bytes data type and the variable_length_bytes data types could be fused completely.

Since zarr python 2.x was saving bytes fill values as base64-encoded strings, there's a bit of inertia there. Would also be good to hear thoughts from other implementers (@LDeakin , @jbms, @manzt )

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Oct 28, 2025
@codecov
Copy link

codecov bot commented Oct 28, 2025

Codecov Report

❌ Patch coverage is 51.85185% with 52 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.69%. Comparing base (182504c) to head (4b87634).

Files with missing lines Patch % Lines
src/zarr/core/dtype/npy/bytes.py 57.14% 30 Missing ⚠️
src/zarr/core/dtype/__init__.py 53.84% 6 Missing ⚠️
src/zarr/core/dtype/registry.py 66.66% 4 Missing ⚠️
src/zarr/errors.py 0.00% 3 Missing ⚠️
src/zarr/core/dtype/npy/bool.py 0.00% 1 Missing ⚠️
src/zarr/core/dtype/npy/common.py 50.00% 1 Missing ⚠️
src/zarr/core/dtype/npy/complex.py 0.00% 1 Missing ⚠️
src/zarr/core/dtype/npy/float.py 0.00% 1 Missing ⚠️
src/zarr/core/dtype/npy/int.py 0.00% 1 Missing ⚠️
src/zarr/core/dtype/npy/string.py 0.00% 1 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3559      +/-   ##
==========================================
- Coverage   61.86%   61.69%   -0.18%     
==========================================
  Files          85       85              
  Lines       10137    10230      +93     
==========================================
+ Hits         6271     6311      +40     
- Misses       3866     3919      +53     
Files with missing lines Coverage Δ
src/zarr/core/dtype/common.py 28.39% <ø> (+0.68%) ⬆️
src/zarr/core/dtype/npy/bool.py 45.61% <0.00%> (-0.82%) ⬇️
src/zarr/core/dtype/npy/common.py 61.53% <50.00%> (-0.19%) ⬇️
src/zarr/core/dtype/npy/complex.py 47.61% <0.00%> (-0.58%) ⬇️
src/zarr/core/dtype/npy/float.py 46.87% <0.00%> (-0.50%) ⬇️
src/zarr/core/dtype/npy/int.py 53.41% <0.00%> (-0.17%) ⬇️
src/zarr/core/dtype/npy/string.py 44.11% <0.00%> (-0.33%) ⬇️
src/zarr/core/dtype/npy/structured.py 55.78% <0.00%> (-0.60%) ⬇️
src/zarr/core/dtype/npy/time.py 52.84% <0.00%> (-0.31%) ⬇️
src/zarr/dtype.py 0.00% <0.00%> (ø)
... and 4 more
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@d-v-b d-v-b marked this pull request as ready for review November 16, 2025 15:00
…ions for bytes / variable-length bytes; set up alias logic for variable-length bytes; parse_dtype now takes the bytes type as an alias for the bytes dtype
@github-actions github-actions bot removed the needs release notes Automatically applied to PRs which haven't added release notes label Nov 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant