From 177c6c158bd1634ab91b8bafda4b77eb07fa92cd Mon Sep 17 00:00:00 2001 From: LastWhisper Date: Thu, 30 May 2024 00:32:58 +0800 Subject: [PATCH] Update README.md, add a header figure. --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6773dbb..958346b 100644 --- a/README.md +++ b/README.md @@ -2,6 +2,10 @@ > Google DeepMind: Mixture-of-Depths Unofficial Implementation. +
+MoD +
+ ## TODO List - [ ] Enable the **batching forward** operation. @@ -25,4 +29,4 @@ This section informs us how to solve the non-causal problem of the top-k operati > Section 5. "If a token does not participate in self-attention at a certain block, then later tokens will also not be able to attend to it." -This section tells us that for any Transformer block, whether a token participates in computation is determined after the first routing and will not change afterward. (✅ implement) \ No newline at end of file +This section tells us that for any Transformer block, whether a token participates in computation is determined after the first routing and will not change afterward. (✅ implement)