-
Notifications
You must be signed in to change notification settings - Fork 37
Open
Description
Hello, I try to re-train the model on GPU and come to some issues with batching and sampling.
When coming to the training loop, the sample has an effective batch size of 1, so each iteration equals one sample? Is it expected behaviour or am I doing something wrong? Is it connected to the custom bucket sampling used in the code? I tried to swap it with Standard sampler from PyTorch and when I tried to make a batch size bigger than 1, constant memory errors persisted. I have 48 GB RAM on a40 GPU. When profiled, it showed that hundreds of GB get swapped back-and-forth in 10 dev-run iterations for GigaSpeech samples.
So, is that a dataset issue, Lightning issue, or sampler issue?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels