-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support I/O with zarr #1766
base: main
Are you sure you want to change the base?
support I/O with zarr #1766
Conversation
- added kwargs for more options opening zarr
50c1b4e
to
04a30be
Compare
for more information, see https://pre-commit.ci
…s and added flake8 ignore to unpack operator
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
Thank you for the PR! |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1766 +/- ##
==========================================
- Coverage 92.26% 91.91% -0.36%
==========================================
Files 84 84
Lines 12447 12501 +54
==========================================
+ Hits 11484 11490 +6
- Misses 963 1011 +48
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
It looks like that the zarr-specific tests have not been run on codebase. The tests are clearly there, but the lines are not covered according to codecov; so I guess, thats related to the fact that zarr is optional dependency and not installed for the tests on codebase, so the tests are all skipped. |
Yes someone with access to the runner needs to add zarr to the pip installation. |
I have done this together with PyTorch 2.6.0-support in #1775 |
Thank you for the PR! |
@Berkant03 the CI now took zarr into account. there seems to be a small error somewhere (you can see the error message by clicking onto "Details" next to the ci/codebase entry above. Then you can access the full logs of the run by clicking onto test-amd; test-cuda has been cancelled as the error seems to introduce a deadlock. |
) | ||
|
||
# Wait for the file creation to finish | ||
MPI_WORLD.handle.Barrier() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be able to also use MPI_WORLD.Barrier()
; i guess there is some functionality that makes a Heat-communicator directly fall back to the .handle
-attribute if it doesnt know some method.
_, _, slices = MPI_WORLD.chunk(dndarray.gshape, dndarray.split) | ||
|
||
zarr_array[slices] = ( | ||
dndarray.larray.numpy() # Numpy array needed as zarr can only understand numpy dtypes and infers it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to cause a problem in the tests if the larray is actually on a GPU.
See https://pytorch.org/docs/stable/generated/torch.Tensor.numpy.html for the options.
if os.path.exists(path) and not overwrite: | ||
raise RuntimeError("Given Path already exists.") | ||
|
||
# Zarr functions by chunking the data, were a chunk is a file inside the store. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be a good idea to add this as
Notes
----------
...
to the docstring of the function.
Due Diligence
Description
Added functions to load zarr into memory and save a dndarray into zarr file.
Issue/s resolved: #1632
Type of change
Memory requirements
Memory requirements are as high as the dataset to be loaded is big for loading the data.
Saving an array can up to double the required memory.
Performance
Save Performance depends on the chunk sizes of the zarr array. This depends on various factors like number of
processes and the shape of the array.
Does this change modify the behaviour of other functions? If so, which?
no