Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Any change needs to be discussed before proceeding. Failure to do so may result in the rejection of the pull request.
What kind of change does this PR introduce?
This PR aims at improving attention with embeddings.
As the embedding size grows the attention module becomes less relevant, because attention does not know about what columns are from the same embeddings.
In order to solve this problem, this PR adds a mask post processing that takes (for now) the max of attention given to any embedding (mean could be tried as well).
Does this PR introduce a breaking change?
Everything is internal and invisible to the end users.
What needs to be documented once your changes are merged?
Closing issues
Put
closes #XXXX
in your comment to auto-close the issue that your PR fixes (if such).