Skip to content

Refactor/data.py#15

Open
agentksimha wants to merge 2 commits intohumanai-foundation:mainfrom
agentksimha:refactor/data.py
Open

Refactor/data.py#15
agentksimha wants to merge 2 commits intohumanai-foundation:mainfrom
agentksimha:refactor/data.py

Conversation

@agentksimha
Copy link
Copy Markdown

Summary

Three targeted improvements to ArtExtract_Soyoung/utils/data.py:

1. File path validation

  • Validates all directories (images, masks, sub-dirs) at init time with clear, actionable error messages
  • Detects unpaired images (no matching masks) before training starts
  • Runtime checks in __getitem__ catch deleted files or broken symlinks mid-run
  • Replaces brittle string concatenation with os.path.join

2. Fix double-normalisation on masks

  • ToTensor() already scales uint8 PIL images to [0, 1]; the previous / 255.0 further compressed values to [0, ~0.004], effectively zeroing all masks
  • Removed the redundant division — masks now normalise exactly once

3. Reproducible DataLoader with num_workers

  • num_workers = min(4, os.cpu_count()) overlaps I/O with GPU compute
  • persistent_workers=True avoids re-spawning workers each epoch
  • Seeded torch.Generator makes shuffle order deterministic across runs

No changes to model architecture, training loop, or public API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants