Description
What would you like to see added to HDMF?
Right now, for the GenericDataChunkIterator
, it's possible to define chunk_mb
or chunk_shape
. I would like to enable a hybrid approach, where a user could input chunk_mb=10.0, chunk_shape=(None, 64)
, and the GenericDataChunkIterator
would identify the remaining dimension that gets you close to the target chunk size.
Is your feature request related to a problem?
It is pretty common for users to have some insight into the likely read patterns of a dataset.
What solution would you like?
I would like GenericDataChunkIterator
to find the maximum size (prod of dims) that is <= the target size. I also would like the chunk to be as cube-like as possible, so I would like to minimize the sum of the dimensions of the array. Previously, we tried building chunks that were scaled down versions of the data shape, similar to h5py, but experience with Jeremy has shown that this approach is poorly suited for common data reading routines, and I think a better naive assumption would be that (hyper-) cube chunks are a good default.
Do you have any interest in helping implement the feature?
Yes.
Code of Conduct
- I agree to follow this project's Code of Conduct
- Have you checked the Contributing document?
- Have you ensured this change was not already requested?