Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Softmax in Retention? #3

Open
Shreyas-Dongre opened this issue Nov 2, 2023 · 2 comments
Open

Softmax in Retention? #3

Shreyas-Dongre opened this issue Nov 2, 2023 · 2 comments

Comments

@Shreyas-Dongre
Copy link

Shreyas-Dongre commented Nov 2, 2023

Hey,
I am working on a similar project and wanted to understand RMT paper more clearly. And your work helped a lot! Thanks! It is very well written! I had a few doubts.

  1. You have used swish gate and Group Norm at places, (which is correct according to original retention paper). But because you said that this is a implementation of RMT, I am confused as to why did you implement GroupNorm and not softmax, as in RMT the authors have claimed to use softmax.
  2. The D mask/ gamma implementation of yours seems to make the retention matrix to 2D just as RMT, but I could not figure out where exactly are you making the matrix bidirectional? RMT claims to make the Retention matrix both, bi-directional and 2D. So I am kind of confused about that.
  3. You have used XPos encoding, yes it is exactly like pure retenton, but RMT dooes not mention using XPos AFAIK.
    Thanks for publishing this code. This has been very useful.
    Thank you!
@jiaowoguanren0615
Copy link
Owner

Hello, I'm sorry for replying you so late. I didn't fully understand the meaning of the article when I wrote it. As a result, when I finished writing and running it, the training process took a lot of time. I have also been thinking of ways to revise it recently. Thank you for your potential questions and suggestions.

@prsbsvrn
Copy link

Hello Shreyas Dongre and jiaowoguanren0615,

I’ve been following your discussion on the RMT implementation with great interest. Shreyas Dongre, your insights on swish gate, Group Norm, and Xpos are incredibly valuable. jiaowoguanren0615, I admire your openness to revising the project.

I’m in a similar project and find myself aligned with the points raised by Shreyas Dongre. I would greatly appreciate if Shreyas Dongre could share any revised code or consider making a pull request to the repository. This collaboration could be highly beneficial for all of us involved.

Thank you both for your impressive work and contributions to this field.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants