feat: add to_/from_safetensors #3685

pfackeldey · 2025-10-17T12:06:53Z

This PR adds to and from safetensors conversions. They're extremely fast at the cost of file size because they to not include any compression. The idea is that all buffers are saved as a long sequence of uncompressed bytes along with metadata that remembers where each buffers starts and stops (similar to an awkward array). Loading it mmaps the file and accessing individual buffers loads only the corresponding slice into memory. This is basically what zarr does but with a dynamic chunk size instead of a static one (which is good for us, because we don't have rectangular arrays) and when one turns off compression.

codecov · 2025-10-17T12:11:53Z

Codecov Report

❌ Patch coverage is 83.92857% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.69%. Comparing base (b749e49) to head (960c99c).
⚠️ Report is 445 commits behind head on main.

Files with missing lines	Patch %	Lines
src/awkward/operations/ak_to_safetensors.py	80.76%	5 Missing ⚠️
src/awkward/operations/ak_from_safetensors.py	85.71%	4 Missing ⚠️

Additional details and impacted files

Files with missing lines	Coverage Δ
src/awkward/operations/__init__.py	`100.00% <100.00%> (ø)`
src/awkward/operations/ak_from_safetensors.py	`85.71% <85.71%> (ø)`
src/awkward/operations/ak_to_safetensors.py	`80.76% <80.76%> (ø)`

... and 197 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

github-actions · 2025-10-17T12:35:54Z

The documentation preview is ready to be viewed at http://preview.awkward-array.org.s3-website.us-east-1.amazonaws.com/PR3685

pfackeldey · 2025-10-17T13:53:00Z

Something is looking weird with the API docs of these two functions, but I don't see what I did wrong... Any ideas?

ianna

@pfackeldey - excellent work! A few minor comments, please, check. Also you correctly support str, pathlib.Path, or file-like objects for destination in docstring, but the implementation does not explicitly normalize Path objects. While safetensors.numpy.save_file accepts paths, an explicit cast like:

import os
from pathlib import Path

if isinstance(destination, Path):
    destination = os.fspath(destination)

can make behavior more predictable across platforms.

src/awkward/operations/ak_from_safetensors.py

src/awkward/operations/ak_to_safetensors.py

src/awkward/operations/ak_from_safetensors.py

ianna · 2025-10-17T14:54:16Z

Something is looking weird with the API docs of these two functions, but I don't see what I did wrong... Any ideas?

Ah, this should come first:

    """
    Args:
...

and then the function description, I think.

ikrommyd · 2025-10-17T16:11:00Z

@pfackeldey do you wanna add tests for every single layout type? You can just copy the layouts from tests/test_3608_to_packed_for_typetracer_backed_arrays.py. I remember adding all the layouts there recently at least. Or tell an LLM to do it actually :)

Co-authored-by: Ianna Osborne <[email protected]>

pfackeldey · 2025-10-20T08:41:49Z

@pfackeldey do you wanna add tests for every single layout type? You can just copy the layouts from tests/test_3608_to_packed_for_typetracer_backed_arrays.py. I remember adding all the layouts there recently at least. Or tell an LLM to do it actually :)

no, this uses to/from_buffers under-the-hood which is well-tested already. I don't think it makes sense to add redundant test cases. This conversion here works as long as ak.to/from_buffers works.

Co-authored-by: Ianna Osborne <[email protected]>

requirements-test-full.txt

ikrommyd · 2025-10-20T15:50:47Z

@pfackeldey maybe I missed something in the code, but shouldn't you materialize before writing to safetensors? to_buffers doesn't by itself. It spits out VirtualNDArray instances. Maybe to_packed is worth it too?

pfackeldey · 2025-10-21T08:32:41Z

@pfackeldey maybe I missed something in the code, but shouldn't you materialize before writing to safetensors? to_buffers doesn't by itself. It spits out VirtualNDArray instances. Maybe to_packed is worth it too?

good point! I'll add that 👍

ikrommyd · 2025-10-21T08:48:16Z

And I had one more thing that I just thought of. Maybe there should be a check that the array is not typetracer-backed when writing? I'm not sure what other IO functions to, I didn't check before writing this . I am saying this because to_buffers will work and to_packed but then you'd try to convert to bytes a typetracer which will probably give not a super clean error

pfackeldey · 2025-10-21T09:04:04Z

And I had one more thing that I just thought of. Maybe there should be a check that the array is not typetracer-backed when writing? I'm not sure what other IO functions to, I didn't check before writing this . I am saying this because to_buffers will work and to_packed but then you'd try to convert to bytes a typetracer which will probably give not a super clean error

it fails with a correct and good error already:

... 
TypeError: cannot call 'to_buffers' on an array without concrete data

ikrommyd · 2025-10-21T14:02:09Z

Ah good. I was under the impression from buffers would be fine. I should have tried it before speaking I guess. Thanks for checking.

pfackeldey and others added 3 commits October 17, 2025 14:02

feat: add to_/from_safetensors

d5180f6

Merge branch 'main' into to_from_safetensors.py

257006f

style: pre-commit fixes

c4345c5

pfackeldey added 2 commits October 17, 2025 14:12

satisfy pre-commit

1c9e370

add test

ce4a86b

pfackeldey marked this pull request as ready for review October 17, 2025 12:20

satisfy pylint too

98fb3cc

pfackeldey requested a review from ianna October 17, 2025 13:47

ianna requested changes Oct 17, 2025

View reviewed changes

ianna added the pr-next-release Required for the next release label Oct 17, 2025

pfackeldey and others added 6 commits October 20, 2025 10:37

Update src/awkward/operations/ak_from_safetensors.py

aefccf9

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_from_safetensors.py

0edea75

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_from_safetensors.py

19871f4

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_to_safetensors.py

15338b2

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_from_safetensors.py

5737374

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_to_safetensors.py

818ddda

Co-authored-by: Ianna Osborne <[email protected]>

pfackeldey and others added 8 commits October 20, 2025 10:42

Update src/awkward/operations/ak_from_safetensors.py

c4a8aa5

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_to_safetensors.py

1c68716

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_from_safetensors.py

fecc00e

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_from_safetensors.py

1de11b9

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_from_safetensors.py

65829b4

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_from_safetensors.py

3f23cd5

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_to_safetensors.py

a3339b6

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_from_safetensors.py

a6fb568

Co-authored-by: Ianna Osborne <[email protected]>

pfackeldey and others added 6 commits October 20, 2025 10:46

Update src/awkward/operations/ak_to_safetensors.py

895888b

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_to_safetensors.py

b63b72b

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_to_safetensors.py

c7724d3

Co-authored-by: Ianna Osborne <[email protected]>

Update src/awkward/operations/ak_to_safetensors.py

72191b4

Co-authored-by: Ianna Osborne <[email protected]>

address remaining comments

76246fa

Merge branch 'main' into to_from_safetensors.py

42d8920

ikrommyd reviewed Oct 20, 2025

View reviewed changes

requirements-test-full.txt Show resolved Hide resolved

make sure arrays are packed before serializing to safetensors

960c99c

feat: add to_/from_safetensors #3685

Are you sure you want to change the base?

feat: add to_/from_safetensors #3685

Uh oh!

Conversation

pfackeldey commented Oct 17, 2025

Uh oh!

codecov bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Oct 17, 2025

Uh oh!

pfackeldey commented Oct 17, 2025

Uh oh!

ianna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ianna commented Oct 17, 2025

Uh oh!

ikrommyd commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfackeldey commented Oct 20, 2025

Uh oh!

Uh oh!

ikrommyd commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfackeldey commented Oct 21, 2025

Uh oh!

ikrommyd commented Oct 21, 2025

Uh oh!

pfackeldey commented Oct 21, 2025

Uh oh!

ikrommyd commented Oct 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Oct 17, 2025 •

edited

Loading

ikrommyd commented Oct 17, 2025 •

edited

Loading

ikrommyd commented Oct 20, 2025 •

edited

Loading