Encoder-decoder Multihead attention cpu optimization #43

NickNickGo · 2020-10-14T03:03:09Z

This PR reduces CPU time for encoder-decoder multihead attention by 25-30%. GPU time is reduced by 10%.

Unnecessary reshapes during EINSUM op are eliminated.
EINSUM logic converted to BMM OP, thus avoiding CPU overhead during EINSUM.
Overall generation time reduces from 47.8 to 44.9.
Attaching before/after profile results:

…ntion

JiushengChen · 2020-10-14T03:18:58Z

Good improvement!

Can we do the same for Huggingface and ProphetNet (its implementation is separated.)?
Update benchmarks

yuyan2do

Also update readme and number in benchmark script

yuyan2do · 2020-10-14T03:35:51Z

fastseq/optimizer/fairseq/beam_search_optimizer_v1.py

                        torch.bool), float("-inf"))
            else:
+                #Not supported


add "assert False, reason"

JiushengChen · 2020-10-14T03:28:46Z

fastseq/optimizer/fairseq/beam_search_optimizer_v1.py

        else:
+            q = q.contiguous().view(tgt_len, bsz * self.num_heads,


why contiguous is needed here?

This was present in the earlier implementation, I didn't touch it since my changes are only meant for en-dec attention. I agree this is redundant. I'll remove it.

There are other places using contiguous. please also check if they can be removed as well.

I just checked this. In all other places, its present after permute/transpose operations which is essential.

Removing unnecessary reshapes and cpu logic from encoder-decoder atte…

5c631a3

…ntion

NickNickGo requested a review from a team October 14, 2020 03:03

yuyan2do approved these changes Oct 14, 2020

View reviewed changes

fastseq/optimizer/fairseq/beam_search_optimizer_v1.py

torch.bool), float("-inf"))

else:

#Not supported

Copy link

Member

yuyan2do Oct 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add "assert False, reason"

JiushengChen reviewed Oct 14, 2020

View reviewed changes

JiushengChen mentioned this pull request Nov 11, 2020

[WIP] Support TorchScript and graph rewrite #54

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Encoder-decoder Multihead attention cpu optimization #43

Encoder-decoder Multihead attention cpu optimization #43

Uh oh!

NickNickGo commented Oct 14, 2020

Uh oh!

JiushengChen commented Oct 14, 2020 •

edited

Loading

Uh oh!

yuyan2do left a comment

Uh oh!

yuyan2do Oct 14, 2020

Uh oh!

JiushengChen Oct 14, 2020

Uh oh!

NickNickGo Oct 14, 2020

Uh oh!

JiushengChen Oct 14, 2020

Uh oh!

NickNickGo Oct 16, 2020

Uh oh!

Uh oh!

Encoder-decoder Multihead attention cpu optimization #43

Are you sure you want to change the base?

Encoder-decoder Multihead attention cpu optimization #43

Uh oh!

Conversation

NickNickGo commented Oct 14, 2020

Uh oh!

JiushengChen commented Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuyan2do left a comment

Choose a reason for hiding this comment

Uh oh!

yuyan2do Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

JiushengChen Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

NickNickGo Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

JiushengChen Oct 14, 2020

Choose a reason for hiding this comment

Uh oh!

NickNickGo Oct 16, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

JiushengChen commented Oct 14, 2020 •

edited

Loading