Skip to content

Conversation

@AbhishekAshokDubey
Copy link

@AbhishekAshokDubey AbhishekAshokDubey commented Jul 13, 2023

Updating the forward function in Transformer block.

The change is simple, but still trying my best to explain below:

As per original paper: In 'Add & Norm' block of Transformer, Layer Norm is applied on top of => input/ residual and output of Self-attention. While in the current code, layer Norm is applied first & then added back to the input/ residual.

Updating the forward function in Transformer block.

The code is simple to example the pull request, but still trying my best to explain below:

As per paper: In 'Add & Norm' block of Transformer, Layer Norm is applied on top of input/ residual & output of Self-attention. While in the current code, first layer Norm is applied & then added back to the input/ residual.
@AbhishekAshokDubey AbhishekAshokDubey changed the title Update gpt.py Minor correction in 'Add & Norm' logic in Block Class in gpt.py Jul 13, 2023
@reallyigor
Copy link

See 1:35:33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants