-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
as_strided() error #41
Comments
After some debugging, this happens when a batch isn't full. tmp.size(0) != res.size(0), which is causing the error in as_strided This could maybe be fixed by padding each batch to reach the batch size? |
Have you ever solved this problems? I got this problem when running LSR(bert version) setting BATCH_SZIE from 1 to 12. And I do not know how to address it. |
Nope, I just moved on to a different relation extraction model. If you are curious, currently ATLOP has state-of-the-art on DocRED (63% F1). It is a much simpler model as well, and they provide the trained model checkpoints from their paper. |
You are right, I've run this model with batch_size=12 for 33 hours (no bert version), and still it has not finished yet (currently at epoch 140). And ATLOP is much faster with nice result (I ran ATLOP-bert-base-cased and got re_f1: 61.31%) |
Hi @ThinkNaive , Thanks for your attention. For the BERT-based model, we empirically use a large batch size ( > 16 ) for BERT-model for better convergence. |
@nanguoshun Thank you for advice. Maybe I should use gpu with large memory for large batch size. I currently work on 12GB memory. |
Yea, 12GB may not be enough. The curse of deep learning 😆 |
I'm encountering this error even on a machine with 4 16GB GPUs. When this happened I checked the GPU consumption and it was very low, so it couldn't be that the machine is out of GPU memory. I even reduced the batch size to 8 and the hidden dim to 64 but couldn't fix this. Would it be possible for someone to examine this? Thank you. |
Have you changed its parameters? I run ATLOP-bert-base-cased and just get re_f1: 59%. |
@IKeepMoving I used the weights they provide from their GitHub page (see the releases pane on the right side). No need to re-train unless you really want to :) |
I've noticed a few people encountering issues like this:
It has been stated that changing the seed or batch size can help fix this. But the memory requirement for training is quite insane lol. I can try with a batch size of 1 or 2 only, and all the seeds I've tried hit a similar error.
I'm pretty unfamiliar with that exactly is causing this error, but if you can suggest a fix for the code I can try it out! Otherwise, I cannot train.
The text was updated successfully, but these errors were encountered: