The original blank file can be downloaded from ai-by-hand.
This file provides a hands-on approach to the following concepts adopted in Deepseek:
- Multi-head Latent Attention
- RoPE (Rotary Position Embedding)
- Mixture of Experts
The original blank file can be downloaded from ai-by-hand.
This file provides a hands-on approach to the following concepts adopted in Deepseek: