Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NMT models speedup abnormally related to batch size #106

Open
dearchill opened this issue Oct 29, 2021 · 2 comments
Open

NMT models speedup abnormally related to batch size #106

dearchill opened this issue Oct 29, 2021 · 2 comments

Comments

@dearchill
Copy link

Hi, Thanks for the great work. I just tested the fairseq-generate in my test set(ZH-EN translation) using the FastSeq and Fairseq, and the speedup is quiet abnormal comparing with the example link.
My test set has 1526 sentences with 5~150 Chinese characters each, and my experiment is on NVIDIA Tesla T4. The translation model I used is base transformer arch in fairseq, with encoder layer nums equals to 30.
I tested with following command:
for fairseq, fairseq-generate ../data-bin --path model_avg.pt --remove-bpe --batch-size 128
for fastseq, fastseq-generate-for-fairseq ../data-bin --path model_avg.pt --remove-bpe --batch-size 128 --postprocess-workers 5
I didn't use the --no-repeat-ngram-size in fastseq, and the beam size is default 5, lenpen is 1.
My test result is as follows:

BatchSize not assigned 128 10 5 1
fairseq-0.10.2 65.79 sentences/s 63.18 sentences/s 19.06 sentences/s 11.79 sentences/s 3.06 sentences/s
above + fastseq 75.55 sentences/s 74.28 sentences/s 17.38 sentences/s 11.47 sentences/s 2.92 sentences/s

I found when the batch size is large(such as 128 and above), the fastseq has obvious speedup(but not as much as 2x or above), but when the batch size is small( I test this because of my need for model used in actual situation for deployment), the fastseq seems like behaving no speedup at all, and even slower. I think the phenomenon quiet abnormal and ask for your help. Looking for your reply.

@yuyan2do
Copy link
Member

Hi dearchill, thanks for your question. If you are using the latest version, add --required-seq-len-multiple 8 will make both baseline and treatment faster. When the batch size and input length are small, the gain is usually smaller. If you provide a runnable case, we could look into it further.

@dearchill
Copy link
Author

Hi dearchill, thanks for your question. If you are using the latest version, add --required-seq-len-multiple 8 will make both baseline and treatment faster. When the batch size and input length are small, the gain is usually smaller. If you provide a runnable case, we could look into it further.

Hi, I added --required-seq-len-multiple 8 arg and no gains, it's so weird. I'll continue test with some other models and test set to see effects, and hereby post them if I have some findings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants