Using this app you can force DeepSeek R1 models to think more deeply by extending their reasoning process. It uses unsloth optimized models for better performance and unlimited context length (only limited by available VRAM).
The app works by detecting when the model tries to conclude thoughts too early and replacing those with prompts that encourage additional reasoning, continuing until a minimum threshold of thinking set by you is reached.
App by anzorq. If you like it, please consider supporting me:
![image](https://private-user-images.githubusercontent.com/3750161/406806995-05d1c32d-de56-446a-b8b7-e7e51fa32b18.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3Mzk1MDg4ODAsIm5iZiI6MTczOTUwODU4MCwicGF0aCI6Ii8zNzUwMTYxLzQwNjgwNjk5NS0wNWQxYzMyZC1kZTU2LTQ0NmEtYjhiNy1lN2U1MWZhMzJiMTgucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI1MDIxNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNTAyMTRUMDQ0OTQwWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9NzkzZGRlNmI4ZDdiNmU0ZmFkZGQ3Mjg1NmUxMjA0YTE3MTI4M2I4MDg5NjUyYmJkYTZjOWEzOTQ1OTE1YzJiMiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QifQ.HdfP1zn2Qu7rHEeIn5ScMmJRfTtvtgAqm1TVL8Ea5Bg)
- 🤔 Force models to think longer and more thoroughly
- 🔄 Customizable reasoning extensions and thinking thresholds
- 🎯 Fine-grained control over model parameters (temperature, top-p, etc.)
- 💭 Visible thinking process with token count tracking
- 📝 LaTeX support for mathematical expressions
- 🖥️ Optimized for various VRAM configurations
- ♾️ Unlimited context length (VRAM-dependent)
- 🔄 Choose from multiple model sizes (1.5B to 70B parameters)
You can choose from any of the unsloth-optimized distilled DeepSeek R1 models:
- 1.5B parameters (Qwen): unsloth/DeepSeek-R1-Distill-Qwen-1.5B
- 7B parameters (Qwen): unsloth/DeepSeek-R1-Distill-Qwen-7B
- 14B parameters (Qwen): unsloth/DeepSeek-R1-Distill-Qwen-14B
- 32B parameters (Qwen): unsloth/DeepSeek-R1-Distill-Qwen-32B
- 8B parameters (LLaMA): unsloth/DeepSeek-R1-Distill-Llama-8B
- 70B parameters (LLaMA): unsloth/DeepSeek-R1-Distill-Llama-70B
Choose the model size based on your available VRAM and performance requirements. Larger models generally provide better quality responses but require more VRAM. Qwen and LLaMA architectures may perform differently on various tasks.
Note: You can run models up to 14B parameters on a free Google Colab T4 GPU.
- Original idea and implementation - vgel's gist
- DeepSeek LLM - https://github.com/deepseek-ai/DeepSeek-LLM
- unsloth - https://github.com/unsloth/unsloth
- Gradio - https://github.com/gradio-app/gradio