This repo contains the resources for DeepSeek R1.
Continue to add more resources.
- Base model: DeepSeek-V3 Technical Report
- GRPO algorithm: DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
- R1 paper: DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
- huggingface/open-r1: Fully open reproduction of DeepSeek-R1
- Unakar/Logic-RL: Reproduce R1 Zero on Logic Puzzle
- hkust-nlp/simpleRL-reason: This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data
- agentica-org/DeepScaleR-1.5B-Preview · Hugging Face
- Jiayi-Pan/TinyZero: Clean, minimal, accessible reproduction of DeepSeek R1-Zero (github.com)
- dhcode-cpp/X-R1: minimal-cost for training 0.5B R1-Zero
- X-R1 aims to build an easy-to-use, low-cost training framework based on reinforcement learning to accelerate the development of Scaling Post-Training
- lsdefine/simple_GRPO: A very simple GRPO implement for reproducing r1-like LLM thinking.
- A very simple GRPO implement for reproducing r1-like LLM thinking. This is a simple open source implementation that utilizes the core loss calculation formula referenced from Hugging Face's trl.
- The Illustrated DeepSeek-R1 - by Jay Alammar (languagemodels.co)
- DeepSeek Debates: Chinese Leadership On Cost, True Training Cost, Closed Model Margin Impacts – SemiAnalysis
- 夜话DeepSeek:技术原理与未来方向 关于DeepSeek R1的思考和启发
- 夜话DeepSeek:技术原理与未来方向 大规模强化学习技术原理与大模型技术发展研判
- 夜话DeepSeek:技术原理与未来方向 DeepSeek系统软件优化总结
- 夜话DeepSeek:技术原理与未来方向 从DeepSeek看大模型软硬件优化
- DeepSeek的出现为我们带来的启示
- DeepSeek使用技巧
- Understanding Reasoning LLMs - by Sebastian Raschka, PhD
- R1-Zero and R1 Results and Analysis (arcprize.org)
- 86 条 DeepSeek 的关键思考
- DeepSeek, China, OpenAI, NVIDIA, xAI, TSMC, Stargate, and AI Megaclusters | Lex Fridman Podcast
- DeepSeek FAQ
- PKU-Alignment/Align-DS-V · Hugging Face
- Deep-Agent/R1-V: Witness the aha moment of VLM with less than $3.
- om-ai-lab/VLM-R1: Solve Visual Understanding with Reinforced VLMs
- Since the introduction of Deepseek-R1, numerous works have emerged focusing on reproducing and improving upon it. In this project, we propose VLM-R1, a stable and generalizable R1-style Large Vision-Language Model.
- DeepSeek-R1-Distill-Qwen-7B vLLM 部署调用
- ktransformers/doc/en/DeepseekR1_V3_tutorial.md at main · kvcache-ai/ktransformers
- Support DeepseekR1 and V3 on single (24GB VRAM)/multi gpu and 382G DRAM, up to 3~28x speedup.
- Multi-Head Latent Attention(MLA)详细介绍
- 一文搞懂DeepSeek的技术演进之路:大语言模型、视觉语言理解、多模态统一模型 - 知乎
- getAsterisk/deepclaude: A high-performance LLM inference API and Chat UI that integrates DeepSeek R1's CoT reasoning traces with Anthropic Claude models.
- DeepClaude is a high-performance LLM inference API that combines DeepSeek R1's Chain of Thought (CoT) reasoning capabilities with Anthropic Claude's creative and code generation prowess. It provides a unified interface for leveraging the strengths of both models while maintaining complete control over your API keys and data.