Skip to content

Fix corrupt image handling in custom dataset#126

Open
sushantkhemalapure wants to merge 7 commits intohumanai-foundation:mainfrom
sushantkhemalapure:fix/dataset-corrupt-image-handling
Open

Fix corrupt image handling in custom dataset#126
sushantkhemalapure wants to merge 7 commits intohumanai-foundation:mainfrom
sushantkhemalapure:fix/dataset-corrupt-image-handling

Conversation

@sushantkhemalapure
Copy link
Copy Markdown

Issue

In custom_dataset.py, when an image cannot be read (e.g., corrupt or missing), the dataset returns an unexpected type (such as a string or tuple) instead of the expected sample dictionary. This can break the DataLoader during training or evaluation.

Fix

Updated the dataset logic to handle unreadable images safely by ensuring only valid sample formats are returned. Invalid samples are either skipped or handled with a clear error.

Impact

  • Prevents DataLoader crashes caused by inconsistent return types
  • Improves robustness when encountering corrupt or missing images
  • Makes training and evaluation more stable

Testing

  • Tested with valid images → dataset works as expected
  • Simulated corrupt/missing images → handled without crashing

- Updated TrOCR model path from ../../weights to ../../models
- Ensures correct model loading instead of CRAFT weights
- Corrected unpacking of image.shape[:2] to (height, width)
- Prevents distortion in rotated images
- Added sys.path handling to locate CRAFT module
- Ensures qapp.py runs consistently outside Docker
- Prevent returning invalid sample types
- Raise error or safely skip unreadable images
@sushantkhemalapure
Copy link
Copy Markdown
Author

Handled invalid image cases to avoid breaking the DataLoader. Let me know if you'd prefer a different strategy (skip vs error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant