Skip to content

[Feature]: Define partial chunk shape for GenericDataChunkIterator #995

@bendichter

Description

@bendichter

What would you like to see added to HDMF?

Right now, for the GenericDataChunkIterator, it's possible to define chunk_mb or chunk_shape. I would like to enable a hybrid approach, where a user could input chunk_mb=10.0, chunk_shape=(None, 64), and the GenericDataChunkIterator would identify the remaining dimension that gets you close to the target chunk size.

Is your feature request related to a problem?

It is pretty common for users to have some insight into the likely read patterns of a dataset.

What solution would you like?

I would like GenericDataChunkIterator to find the maximum size (prod of dims) that is <= the target size. I also would like the chunk to be as cube-like as possible, so I would like to minimize the sum of the dimensions of the array. Previously, we tried building chunks that were scaled down versions of the data shape, similar to h5py, but experience with Jeremy has shown that this approach is poorly suited for common data reading routines, and I think a better naive assumption would be that (hyper-) cube chunks are a good default.

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    category: enhancementimprovements of code or code behaviorpriority: mediumnon-critical problem and/or affecting only a small set of users

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions