Skip to content

Add WebP Image Dataset Support for HDF5 Storage Format #8

@NetZissou

Description

@NetZissou

WebP Images in HDF5 Dataset Class

Overview:

Currently, the datasets module supports:

  • Parquet datasets
  • Image folder datasets

Add a new dataset class to handle WebP images stored in HDF5 format, which would provide:

  • Efficient storage and retrieval of WebP compressed images
  • Better I/O performance for large image collections
  • Seamless integration with existing batch inference pipeline

Use Cases

  • Processing large collections of WebP images stored in HDF5 archives
  • Efficient batch inference on compressed image datasets
  • Scientific computing workflows requiring high-performance image I/O

Requirements& Todos

  • Create WebPHDF5Dataset class in the datasets module
  • Support for reading WebP images from HDF5 groups/datasets
  • Integration with existing DataLoader infrastructure
  • Proper error handling for corrupted or missing images
  • Memory-efficient streaming for large datasets

Dependencies:

  • h5py for HDF5 file operations
  • pillow or opencv-python for WebP decoding
  • Existing dataset infrastructure

Documentation:

  • API documentation
  • Usage examples
  • Performance comparison guide

Metadata

Metadata

Assignees

Labels

Datasets classDataset classdocumentationImprovements or additions to documentationenhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions