Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add file handle API #375

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
Open

Conversation

ericpre
Copy link
Member

@ericpre ericpre commented Mar 7, 2025

Closes #331.

Progress of the PR

  • Add file_handle API,
  • implement file_handle API for the following format,
    • hspy
    • emd
    • nexus
    • usid
    • tiff
  • update docstring (if appropriate),
  • update user guide (if appropriate),
  • add a changelog entry in the upcoming_changes folder (see upcoming_changes/README.rst),
  • Check formatting of the changelog entry (and eventual user guide changes) in the docs/readthedocs.org:rosettasciio build of this PR (link in github checks)
  • add tests,
  • ready for review.

Copy link

codecov bot commented Mar 7, 2025

Codecov Report

Attention: Patch coverage is 93.75000% with 1 line in your changes missing coverage. Please review.

Project coverage is 87.78%. Comparing base (f94acf7) to head (decb0be).
Report is 49 commits behind head on main.

Files with missing lines Patch % Lines
rsciio/_hierarchical.py 66.66% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #375      +/-   ##
==========================================
- Coverage   87.84%   87.78%   -0.06%     
==========================================
  Files          85       85              
  Lines       11246    11257      +11     
  Branches     2084     2087       +3     
==========================================
+ Hits         9879     9882       +3     
- Misses        868      871       +3     
- Partials      499      504       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ericpre ericpre force-pushed the add_file_handle_API branch 2 times, most recently from 68ca85b to 18a0553 Compare March 7, 2025 17:26
@ericpre
Copy link
Member Author

ericpre commented Mar 14, 2025

@CSSFrancis, what do you think about this API?

@CSSFrancis
Copy link
Member

@ericpre I really like this change and think it was a long time coming. That being said I think we need to think a little bit about more how we are handling files and what the relationship between hyperspy --> dask --> underlying file. More and more it's seeming like dask not being "file aware" is a bit of a problem.

More than likely what we need is a abstract class which is something like https://docs.xarray.dev/en/latest/generated/xarray.backends.CachingFileManager.html which would allow File Objects to be passed from one process to the next without pickling. This is how xarray handles things like distributed hdf5 processing which would be wonderful :).

To maybe take it a step further, I would like to build a bit more context support for things like rechunking. I think this would involve something like the optimized slicing that b2h5py does:

https://github.com/Blosc/b2h5py/blob/317f4a0c31ac61623bf49861f3f5c6ebce46f633/b2h5py/blosc2.py#L248

Mostly it would just be optimized slicing for things like virtual images, plotting etc.

@ericpre
Copy link
Member Author

ericpre commented Mar 16, 2025

@CSSFrancis, thank you, this sounds good.

Regarding this PR, would it be better to put it on hold until a file manager is implemented to return the file manager object instead of the file handle? Even if they may have similar API, there are not the same and therefore can't be swapped without breaking the API?

@ericpre ericpre removed this from the v0.8 milestone Mar 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add API for file handle
3 participants