You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, Thanks for the great work. I just tested the fairseq-generate in my test set(ZH-EN translation) using the FastSeq and Fairseq, and the speedup is quiet abnormal comparing with the example link.
My test set has 1526 sentences with 5~150 Chinese characters each, and my experiment is on NVIDIA Tesla T4. The translation model I used is base transformer arch in fairseq, with encoder layer nums equals to 30.
I tested with following command:
for fairseq, fairseq-generate ../data-bin --path model_avg.pt --remove-bpe --batch-size 128
for fastseq, fastseq-generate-for-fairseq ../data-bin --path model_avg.pt --remove-bpe --batch-size 128 --postprocess-workers 5
I didn't use the --no-repeat-ngram-size in fastseq, and the beam size is default 5, lenpen is 1.
My test result is as follows:
BatchSize
not assigned
128
10
5
1
fairseq-0.10.2
65.79 sentences/s
63.18 sentences/s
19.06 sentences/s
11.79 sentences/s
3.06 sentences/s
above + fastseq
75.55 sentences/s
74.28 sentences/s
17.38 sentences/s
11.47 sentences/s
2.92 sentences/s
I found when the batch size is large(such as 128 and above), the fastseq has obvious speedup(but not as much as 2x or above), but when the batch size is small( I test this because of my need for model used in actual situation for deployment), the fastseq seems like behaving no speedup at all, and even slower. I think the phenomenon quiet abnormal and ask for your help. Looking for your reply.
The text was updated successfully, but these errors were encountered:
Hi dearchill, thanks for your question. If you are using the latest version, add --required-seq-len-multiple 8 will make both baseline and treatment faster. When the batch size and input length are small, the gain is usually smaller. If you provide a runnable case, we could look into it further.
Hi dearchill, thanks for your question. If you are using the latest version, add --required-seq-len-multiple 8 will make both baseline and treatment faster. When the batch size and input length are small, the gain is usually smaller. If you provide a runnable case, we could look into it further.
Hi, I added --required-seq-len-multiple 8 arg and no gains, it's so weird. I'll continue test with some other models and test set to see effects, and hereby post them if I have some findings.
Hi, Thanks for the great work. I just tested the fairseq-generate in my test set(ZH-EN translation) using the FastSeq and Fairseq, and the speedup is quiet abnormal comparing with the example link.
My test set has 1526 sentences with 5~150 Chinese characters each, and my experiment is on NVIDIA Tesla T4. The translation model I used is base transformer arch in fairseq, with encoder layer nums equals to 30.
I tested with following command:
for fairseq,
fairseq-generate ../data-bin --path model_avg.pt --remove-bpe --batch-size 128
for fastseq,
fastseq-generate-for-fairseq ../data-bin --path model_avg.pt --remove-bpe --batch-size 128 --postprocess-workers 5
I didn't use the --no-repeat-ngram-size in fastseq, and the beam size is default 5, lenpen is 1.
My test result is as follows:
I found when the batch size is large(such as 128 and above), the fastseq has obvious speedup(but not as much as 2x or above), but when the batch size is small( I test this because of my need for model used in actual situation for deployment), the fastseq seems like behaving no speedup at all, and even slower. I think the phenomenon quiet abnormal and ask for your help. Looking for your reply.
The text was updated successfully, but these errors were encountered: