You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey,
I am working on a similar project and wanted to understand RMT paper more clearly. And your work helped a lot! Thanks! It is very well written! I had a few doubts.
You have used swish gate and Group Norm at places, (which is correct according to original retention paper). But because you said that this is a implementation of RMT, I am confused as to why did you implement GroupNorm and not softmax, as in RMT the authors have claimed to use softmax.
The D mask/ gamma implementation of yours seems to make the retention matrix to 2D just as RMT, but I could not figure out where exactly are you making the matrix bidirectional? RMT claims to make the Retention matrix both, bi-directional and 2D. So I am kind of confused about that.
You have used XPos encoding, yes it is exactly like pure retenton, but RMT dooes not mention using XPos AFAIK.
Thanks for publishing this code. This has been very useful.
Thank you!
The text was updated successfully, but these errors were encountered:
Hello, I'm sorry for replying you so late. I didn't fully understand the meaning of the article when I wrote it. As a result, when I finished writing and running it, the training process took a lot of time. I have also been thinking of ways to revise it recently. Thank you for your potential questions and suggestions.
I’ve been following your discussion on the RMT implementation with great interest. Shreyas Dongre, your insights on swish gate, Group Norm, and Xpos are incredibly valuable. jiaowoguanren0615, I admire your openness to revising the project.
I’m in a similar project and find myself aligned with the points raised by Shreyas Dongre. I would greatly appreciate if Shreyas Dongre could share any revised code or consider making a pull request to the repository. This collaboration could be highly beneficial for all of us involved.
Thank you both for your impressive work and contributions to this field.
Hey,
I am working on a similar project and wanted to understand RMT paper more clearly. And your work helped a lot! Thanks! It is very well written! I had a few doubts.
Thanks for publishing this code. This has been very useful.
Thank you!
The text was updated successfully, but these errors were encountered: