-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
Datasets classDataset classDataset classdocumentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request
Description
WebP Images in HDF5 Dataset Class
Overview:
Currently, the datasets module supports:
- Parquet datasets
- Image folder datasets
Add a new dataset class to handle WebP images stored in HDF5 format, which would provide:
- Efficient storage and retrieval of WebP compressed images
- Better I/O performance for large image collections
- Seamless integration with existing batch inference pipeline
Use Cases
- Processing large collections of WebP images stored in HDF5 archives
- Efficient batch inference on compressed image datasets
- Scientific computing workflows requiring high-performance image I/O
Requirements& Todos
- Create
WebPHDF5Datasetclass in the datasets module - Support for reading WebP images from HDF5 groups/datasets
- Integration with existing DataLoader infrastructure
- Proper error handling for corrupted or missing images
- Memory-efficient streaming for large datasets
Dependencies:
h5pyfor HDF5 file operationspilloworopencv-pythonfor WebP decoding- Existing dataset infrastructure
Documentation:
- API documentation
- Usage examples
- Performance comparison guide
Metadata
Metadata
Assignees
Labels
Datasets classDataset classDataset classdocumentationImprovements or additions to documentationImprovements or additions to documentationenhancementNew feature or requestNew feature or request