Skip to content

Question about batching #6

@adhaesitadimo

Description

@adhaesitadimo

Hello, I try to re-train the model on GPU and come to some issues with batching and sampling.

When coming to the training loop, the sample has an effective batch size of 1, so each iteration equals one sample? Is it expected behaviour or am I doing something wrong? Is it connected to the custom bucket sampling used in the code? I tried to swap it with Standard sampler from PyTorch and when I tried to make a batch size bigger than 1, constant memory errors persisted. I have 48 GB RAM on a40 GPU. When profiled, it showed that hundreds of GB get swapped back-and-forth in 10 dev-run iterations for GigaSpeech samples.

So, is that a dataset issue, Lightning issue, or sampler issue?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions