Hi, great work!
I tried running the training code with 4090 GPUs that has 24GB memory. Even with the batch size set to 1, it exceeded the available memory. Can you confirm if this is the case? And how much Memory does the A100 GPU that you used?
Thank you very much!