Skip to content

DeepSeek AI V3 - The Sigma Savage. v1.0

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE-CODE
Unknown
LICENSE-MODEL
Notifications You must be signed in to change notification settings

iboss21/TheSigma

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

79 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ–ˆโ–ˆโ•—     โ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—      โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•—    
โ–ˆโ–ˆโ•‘     โ•šโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•     โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘    
โ–ˆโ–ˆโ•‘      โ•šโ–ˆโ–ˆโ–ˆโ•”โ• โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•— โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘    
โ–ˆโ–ˆโ•‘      โ–ˆโ–ˆโ•”โ–ˆโ–ˆโ•— โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘     โ–ˆโ–ˆโ•‘   โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ•โ•โ• โ•šโ•โ•โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘    
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•”โ• โ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ•šโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—     โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•‘    
โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•  โ•šโ•โ•โ•šโ•โ•  โ•šโ•โ• โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•โ•โ•โ•โ• โ•šโ•โ•  โ•šโ•โ•โ•šโ•โ•โ•โ•โ•โ•โ•     โ•šโ•โ•  โ•šโ•โ•โ•šโ•โ•    
                                                                                
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ•—                                              
โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•”โ•โ•โ•โ•โ•โ–ˆโ–ˆโ•‘ โ–ˆโ–ˆโ•”โ•                                              
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•”โ•                                               
โ•šโ•โ•โ•โ•โ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ•”โ•โ•โ•  โ–ˆโ–ˆโ•”โ•โ•โ•  โ–ˆโ–ˆโ•”โ•โ–ˆโ–ˆโ•—                                               
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•‘โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ•—โ–ˆโ–ˆโ•‘  โ–ˆโ–ˆโ•—                                              
โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•โ•โ•โ•โ•โ•โ•šโ•โ•  โ•šโ•โ•                                              
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
๐Ÿบ LXRCore-AI-Seek - Advanced AI Language Model System
   Powered by The Land of Wolves ๐Ÿบ | แƒ›แƒ’แƒšแƒ”แƒ‘แƒ˜แƒก แƒ›แƒ˜แƒฌแƒ
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

๐Ÿบ The Land of Wolves - Georgian RP ๐Ÿ‡ฌ๐Ÿ‡ช

แƒ›แƒ’แƒšแƒ”แƒ‘แƒ˜แƒก แƒ›แƒ˜แƒฌแƒ - แƒ แƒฉแƒ”แƒฃแƒšแƒ—แƒ แƒแƒ“แƒ’แƒ˜แƒšแƒ˜!

แƒ˜แƒกแƒขแƒแƒ แƒ˜แƒ แƒชแƒแƒชแƒฎแƒšแƒ“แƒ”แƒ‘แƒ แƒแƒฅ! (History Lives Here!)


Homepage Discord
GitHub Store
Code License Model License

๐ŸŽฏ Serious Hardcore Roleplay | ๐Ÿ”’ Discord & Whitelisted | ๐ŸŒ RedM Georgian Server

๐Ÿ“Š Server Listing


๐Ÿ“š Table of Contents

  1. Introduction
  2. Model Summary
  3. Model Downloads
  4. Evaluation Results
  5. Platform Information
  6. How to Run Locally
  7. License
  8. Citation
  9. Contact

1. ๐Ÿš€ Introduction

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ LXRCORE-AI-SEEK OVERVIEW
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

LXRCore-AI-Seek is a powerful, rebranded implementation of advanced AI language model technology, optimized and branded for The Land of Wolves ๐Ÿบ ecosystem. This project represents a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.

๐ŸŽฏ Key Features

  • Multi-head Latent Attention (MLA) architecture for efficient processing
  • Mixture-of-Experts (MoE) design for optimal resource utilization
  • Auxiliary-loss-free strategy for superior load balancing
  • Multi-token prediction training objective for enhanced performance
  • 14.8 trillion tokens of pre-training data
  • Supervised Fine-Tuning and Reinforcement Learning stages
  • Exceptional stability throughout training (no loss spikes or rollbacks)
  • Cost-effective training: Only 2.788M H800 GPU hours for full training

๐Ÿบ Land of Wolves Integration

This model is specifically adapted for integration with:

  • LXR-Core (Primary Framework)
  • RSG-Core (Primary Framework)
  • VORP Core (Supported/Legacy)

The model achieves performance comparable to leading closed-source models while maintaining the open-source ethos of The Land of Wolves community.


2. ๐Ÿ“Š Model Summary

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ ARCHITECTURE & TRAINING INNOVATIONS
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

๐Ÿ—๏ธ Architecture: Innovative Load Balancing Strategy and Training Objective

  • Built on the efficient MoE architecture with auxiliary-loss-free load balancing strategy
  • Minimizes performance degradation that arises from load balancing requirements
  • Multi-Token Prediction (MTP) objective for enhanced model performance
  • MTP can be leveraged for speculative decoding to accelerate inference

โšก Pre-Training: Ultimate Training Efficiency

  • FP8 mixed precision training framework validated at extreme scale
  • Novel co-design of algorithms, frameworks, and hardware
  • Overcomes cross-node MoE communication bottlenecks
  • Near-complete computation-communication overlap
  • Cost-effective: Only 2.664M H800 GPU hours for 14.8T token pre-training
  • Post-training stages require minimal 0.1M GPU hours

๐ŸŽ“ Post-Training: Advanced Knowledge Distillation

  • Innovative methodology for distilling reasoning capabilities from long-Chain-of-Thought (CoT) models
  • Incorporates verification and reflection patterns for improved reasoning
  • Maintains control over output style and length
  • Enhanced performance without sacrificing usability

3. ๐Ÿ“ฅ Model Downloads

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ MODEL WEIGHTS & DOWNLOADS
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
Model #Total Params #Activated Params Context Length Original Source
LXRCore-AI-Seek-Base 671B 37B 128K ๐Ÿค— Hugging Face
LXRCore-AI-Seek 671B 37B 128K ๐Ÿค— Hugging Face

Note

The total size of LXRCore-AI-Seek models on Hugging Face is 685B, which includes 671B of the Main Model weights and 14B of the Multi-Token Prediction (MTP) Module weights.

Original Model Attribution: This model is based on DeepSeek-V3, rebranded and optimized for The Land of Wolves ecosystem. We acknowledge and respect the original DeepSeek-AI team's work.

To ensure optimal performance and flexibility, we have partnered with open-source communities and hardware vendors to provide multiple ways to run the model locally. For step-by-step guidance, check out Section 6: How to Run Locally.

For developers looking to dive deeper, we recommend exploring README_WEIGHTS.md for details on the Main Model weights and the Multi-Token Prediction (MTP) Modules. Please note that MTP support is currently under active development within the community, and we welcome your contributions and feedback.


4. ๐Ÿ“ˆ Evaluation Results

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ BENCHMARK PERFORMANCE METRICS
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Base Model

Standard Benchmarks

Benchmark (Metric) # Shots DeepSeek-V2 Qwen2.5 72B LLaMA3.1 405B LXRCore-AI-Seek
Architecture - MoE Dense Dense MoE
# Activated Params - 21B 72B 405B 37B
# Total Params - 236B 72B 405B 671B
English Pile-test (BPB) - 0.606 0.638 0.542 0.548
BBH (EM) 3-shot 78.8 79.8 82.9 87.5
MMLU (Acc.) 5-shot 78.4 85.0 84.4 87.1
MMLU-Redux (Acc.) 5-shot 75.6 83.2 81.3 86.2
MMLU-Pro (Acc.) 5-shot 51.4 58.3 52.8 64.4
DROP (F1) 3-shot 80.4 80.6 86.0 89.0
ARC-Easy (Acc.) 25-shot 97.6 98.4 98.4 98.9
ARC-Challenge (Acc.) 25-shot 92.2 94.5 95.3 95.3
HellaSwag (Acc.) 10-shot 87.1 84.8 89.2 88.9
PIQA (Acc.) 0-shot 83.9 82.6 85.9 84.7
WinoGrande (Acc.) 5-shot 86.3 82.3 85.2 84.9
RACE-Middle (Acc.) 5-shot 73.1 68.1 74.2 67.1
RACE-High (Acc.) 5-shot 52.6 50.3 56.8 51.3
TriviaQA (EM) 5-shot 80.0 71.9 82.7 82.9
NaturalQuestions (EM) 5-shot 38.6 33.2 41.5 40.0
AGIEval (Acc.) 0-shot 57.5 75.8 60.6 79.6
Code HumanEval (Pass@1) 0-shot 43.3 53.0 54.9 65.2
MBPP (Pass@1) 3-shot 65.0 72.6 68.4 75.4
LiveCodeBench-Base (Pass@1) 3-shot 11.6 12.9 15.5 19.4
CRUXEval-I (Acc.) 2-shot 52.5 59.1 58.5 67.3
CRUXEval-O (Acc.) 2-shot 49.8 59.9 59.9 69.8
Math GSM8K (EM) 8-shot 81.6 88.3 83.5 89.3
MATH (EM) 4-shot 43.4 54.4 49.0 61.6
MGSM (EM) 8-shot 63.6 76.2 69.9 79.8
CMath (EM) 3-shot 78.7 84.5 77.3 90.7
Chinese CLUEWSC (EM) 5-shot 82.0 82.5 83.0 82.7
C-Eval (Acc.) 5-shot 81.4 89.2 72.5 90.1
CMMLU (Acc.) 5-shot 84.0 89.5 73.7 88.8
CMRC (EM) 1-shot 77.4 75.8 76.0 76.3
C3 (Acc.) 0-shot 77.4 76.7 79.7 78.6
CCPM (Acc.) 0-shot 93.0 88.5 78.6 92.0
Multilingual MMMLU-non-English (Acc.) 5-shot 64.0 74.8 73.8 79.4

Note

Best results are shown in bold. Scores with a gap not exceeding 0.3 are considered to be at the same level. LXRCore-AI-Seek achieves the best performance on most benchmarks, especially on math and code tasks.

Original Model: Based on DeepSeek-V3 architecture and training methodology.

Context Window

Evaluation results on the Needle In A Haystack (NIAH) tests. LXRCore-AI-Seek performs well across all context window lengths up to 128K.

Chat Model

Standard Benchmarks (Models larger than 67B)

Benchmark (Metric) DeepSeek V2-0506 DeepSeek V2.5-0905 Qwen2.5 72B-Inst. Llama3.1 405B-Inst. Claude-3.5-Sonnet-1022 GPT-4o 0513 LXRCore-AI-Seek
Architecture MoE MoE Dense Dense - - MoE
# Activated Params 21B 21B 72B 405B - - 37B
# Total Params 236B 236B 72B 405B - - 671B
English MMLU (EM) 78.2 80.6 85.3 88.6 88.3 87.2 88.5
MMLU-Redux (EM) 77.9 80.3 85.6 86.2 88.9 88.0 89.1
MMLU-Pro (EM) 58.5 66.2 71.6 73.3 78.0 72.6 75.9
DROP (3-shot F1) 83.0 87.8 76.7 88.7 88.3 83.7 91.6
IF-Eval (Prompt Strict) 57.7 80.6 84.1 86.0 86.5 84.3 86.1
GPQA-Diamond (Pass@1) 35.3 41.3 49.0 51.1 65.0 49.9 59.1
SimpleQA (Correct) 9.0 10.2 9.1 17.1 28.4 38.2 24.9
FRAMES (Acc.) 66.9 65.4 69.8 70.0 72.5 80.5 73.3
LongBench v2 (Acc.) 31.6 35.4 39.4 36.1 41.0 48.1 48.7
Code HumanEval-Mul (Pass@1) 69.3 77.4 77.3 77.2 81.7 80.5 82.6
LiveCodeBench (Pass@1-COT) 18.8 29.2 31.1 28.4 36.3 33.4 40.5
LiveCodeBench (Pass@1) 20.3 28.4 28.7 30.1 32.8 34.2 37.6
Codeforces (Percentile) 17.5 35.6 24.8 25.3 20.3 23.6 51.6
SWE Verified (Resolved) - 22.6 23.8 24.5 50.8 38.8 42.0
Aider-Edit (Acc.) 60.3 71.6 65.4 63.9 84.2 72.9 79.7
Aider-Polyglot (Acc.) - 18.2 7.6 5.8 45.3 16.0 49.6
Math AIME 2024 (Pass@1) 4.6 16.7 23.3 23.3 16.0 9.3 39.2
MATH-500 (EM) 56.3 74.7 80.0 73.8 78.3 74.6 90.2
CNMO 2024 (Pass@1) 2.8 10.8 15.9 6.8 13.1 10.8 43.2
Chinese CLUEWSC (EM) 89.9 90.4 91.4 84.7 85.4 87.9 90.9
C-Eval (EM) 78.6 79.5 86.1 61.5 76.7 76.0 86.5
C-SimpleQA (Correct) 48.5 54.1 48.4 50.4 51.3 59.3 64.8

Note

All models are evaluated in a configuration that limits the output length to 8K. Benchmarks containing fewer than 1000 samples are tested multiple times using varying temperature settings to derive robust final results. LXRCore-AI-Seek stands as a best-performing implementation, and exhibits competitive performance against frontier closed-source models.

Open Ended Generation Evaluation

Model Arena-Hard AlpacaEval 2.0
DeepSeek-V2.5-0905 76.2 50.5
Qwen2.5-72B-Instruct 81.2 49.1
LLaMA-3.1 405B 69.3 40.5
GPT-4o-0513 80.4 51.1
Claude-Sonnet-3.5-1022 85.2 52.0
LXRCore-AI-Seek 85.5 70.0

Note

English open-ended conversation evaluations. For AlpacaEval 2.0, we use the length-controlled win rate as the metric.


5. ๐Ÿบ Platform Information

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ THE LAND OF WOLVES COMMUNITY
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

๐ŸŒ Server Information

The Land of Wolves ๐Ÿบ - Georgian RP ๐Ÿ‡ฌ๐Ÿ‡ช
แƒ›แƒ’แƒšแƒ”แƒ‘แƒ˜แƒก แƒ›แƒ˜แƒฌแƒ - แƒ แƒฉแƒ”แƒฃแƒšแƒ—แƒ แƒแƒ“แƒ’แƒ˜แƒšแƒ˜!
แƒ˜แƒกแƒขแƒแƒ แƒ˜แƒ แƒชแƒแƒชแƒฎแƒšแƒ“แƒ”แƒ‘แƒ แƒแƒฅ! (History Lives Here!)

๐ŸŽฏ Framework Support

  • LXR-Core (Primary Framework)
  • RSG-Core (Primary Framework)
  • VORP Core (Supported/Legacy)
  • Additional framework support available upon request

Note

Original Model Attribution: LXRCore-AI-Seek is based on DeepSeek-V3, which provides chat functionality and API services at chat.deepseek.com and platform.deepseek.com. This project is a rebranded implementation for The Land of Wolves ecosystem.


6. ๐Ÿš€ How to Run Locally

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ LOCAL DEPLOYMENT OPTIONS
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

LXRCore-AI-Seek can be deployed locally using the following hardware and open-source community software:

  1. LXRCore-AI-Seek Infer Demo: Simple and lightweight demo for FP8 and BF16 inference.
  2. SGLang: Full support for the model in both BF16 and FP8 inference modes, with Multi-Token Prediction coming soon.
  3. LMDeploy: Enables efficient FP8 and BF16 inference for local and cloud deployment.
  4. TensorRT-LLM: Currently supports BF16 inference and INT4/8 quantization, with FP8 support coming soon.
  5. vLLM: Supports the model with FP8 and BF16 modes for tensor parallelism and pipeline parallelism.
  6. LightLLM: Supports efficient single-node or multi-node deployment for FP8 and BF16.
  7. AMD GPU: Enables running the model on AMD GPUs via SGLang in both BF16 and FP8 modes.
  8. Huawei Ascend NPU: Supports running the model on Huawei Ascend devices in both INT8 and BF16.

Since FP8 training is natively adopted in our framework, we only provide FP8 weights. If you require BF16 weights for experimentation, you can use the provided conversion script to perform the transformation.

Here is an example of converting FP8 weights to BF16:

cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights

Note

Hugging Face's Transformers has not been directly supported yet.

6.1 Inference with LXRCore-AI-Seek Infer Demo (example only)

System Requirements

Note

Linux with Python 3.10 only. Mac and Windows are not supported.

Dependencies:

torch==2.4.1
triton==3.0.0
transformers==4.46.3
safetensors==0.4.5

Model Weights & Demo Code Preparation

First, clone the LXRCore-AI-Seek GitHub repository:

git clone https://github.com/iboss21/TheSigma.git
cd TheSigma

Navigate to the inference folder and install dependencies listed in requirements.txt. Easiest way is to use a package manager like conda or uv to create a new virtual environment and install the dependencies.

cd inference
pip install -r requirements.txt

Download the model weights from Hugging Face (using original DeepSeek-V3 weights), and put them into /path/to/LXRCore-AI-Seek folder.

Model Weights Conversion

Convert Hugging Face model weights to a specific format:

python convert.py --hf-ckpt-path /path/to/LXRCore-AI-Seek --save-path /path/to/LXRCore-AI-Seek-Demo --n-experts 256 --model-parallel 16

Run

Then you can interact with LXRCore-AI-Seek:

torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/LXRCore-AI-Seek-Demo --config configs/config_671B.json --interactive --temperature 0.7 --max-new-tokens 200

Or batch inference on a given file:

torchrun --nnodes 2 --nproc-per-node 8 --node-rank $RANK --master-addr $ADDR generate.py --ckpt-path /path/to/LXRCore-AI-Seek-Demo --config configs/config_671B.json --input-file $FILE

6.2 Inference with SGLang (recommended)

SGLang currently supports MLA optimizations, DP Attention, FP8 (W8A8), FP8 KV Cache, and Torch Compile, delivering state-of-the-art latency and throughput performance among open-source frameworks.

Notably, SGLang v0.4.1 fully supports running the underlying model architecture on both NVIDIA and AMD GPUs, making it a highly versatile and robust solution.

SGLang also supports multi-node tensor parallelism, enabling you to run this model on multiple network-connected machines.

Multi-Token Prediction (MTP) is in development, and progress can be tracked in the optimization plan.

Here are the launch instructions from the SGLang team: https://github.com/sgl-project/sglang/tree/main/benchmark/deepseek_v3

6.3 Inference with LMDeploy (recommended)

LMDeploy, a flexible and high-performance inference and serving framework tailored for large language models, supports the underlying architecture. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows.

For comprehensive step-by-step instructions, please refer to: InternLM/lmdeploy#2960

6.4 Inference with TRT-LLM (recommended)

TensorRT-LLM supports the model architecture, offering precision options such as BF16 and INT4/INT8 weight-only. Support for FP8 is currently in progress and will be released soon. You can access the custom branch of TRTLLM specifically for DeepSeek-V3 support through the following link: https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/deepseek_v3.

6.5 Inference with vLLM (recommended)

vLLM v0.6.6 supports the model architecture for FP8 and BF16 modes on both NVIDIA and AMD GPUs. Aside from standard techniques, vLLM offers pipeline parallelism allowing you to run this model on multiple machines connected by networks. For detailed guidance, please refer to the vLLM instructions. Please feel free to follow the enhancement plan as well.

6.6 Inference with LightLLM (recommended)

LightLLM v1.0.1 supports single-machine and multi-machine tensor parallel deployment for the model architecture (FP8/BF16) and provides mixed-precision deployment, with more quantization modes continuously integrated. For more details, please refer to LightLLM instructions.

6.7 Recommended Inference Functionality with AMD GPUs

The model architecture has Day-One support for AMD GPUs using SGLang, with full compatibility for both FP8 and BF16 precision. For detailed guidance, please refer to the SGLang instructions.

6.8 Recommended Inference Functionality with Huawei Ascend NPUs

The MindIE framework from the Huawei Ascend community has successfully adapted the BF16 version of the underlying architecture. For step-by-step guidance on Ascend NPUs, please follow the instructions here.


7. ๐Ÿ“„ License

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ LICENSE INFORMATION
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

This code repository is licensed under the MIT License.

The use of LXRCore-AI-Seek Base/Chat models is subject to the Model License.

LXRCore-AI-Seek series (including Base and Chat) supports commercial use within The Land of Wolves ecosystem and compatible frameworks.

Original Model Attribution: This project is based on DeepSeek-V3 architecture. We acknowledge and respect the original DeepSeek-AI team's contributions to the open-source AI community.


8. ๐Ÿ“ Citation

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ CITATION & ATTRIBUTION
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

If you use LXRCore-AI-Seek in your research or projects, please cite both this project and the original DeepSeek-V3:

LXRCore-AI-Seek Citation

@software{lxrcore_ai_seek_2025,
  title={LXRCore-AI-Seek: Advanced AI Language Model for The Land of Wolves},
  author={iBoss21 and The Lux Empire},
  year={2025},
  url={https://github.com/iboss21/TheSigma},
  note={Based on DeepSeek-V3 architecture}
}

Original DeepSeek-V3 Citation

@misc{deepseekai2024deepseekv3technicalreport,
      title={DeepSeek-V3 Technical Report}, 
      author={DeepSeek-AI},
      year={2024},
      eprint={2412.19437},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.19437}, 
}

9. ๐Ÿ“ง Contact

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ GET IN TOUCH
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

The Land of Wolves ๐Ÿบ Community

For questions, support, or collaboration opportunities:

Server Information

Join The Land of Wolves - Georgian RP Server:


๐Ÿบ แƒ›แƒ’แƒšแƒ”แƒ‘แƒ˜แƒก แƒ›แƒ˜แƒฌแƒ - แƒ แƒฉแƒ”แƒฃแƒšแƒ—แƒ แƒแƒ“แƒ’แƒ˜แƒšแƒ˜! ๐Ÿบ

History Lives Here - The Land of Wolves


Made with โค๏ธ by iBoss21 & The Lux Empire

Powered by The Land of Wolves Community

About

DeepSeek AI V3 - The Sigma Savage. v1.0

Resources

License

Unknown, Unknown licenses found

Licenses found

Unknown
LICENSE-CODE
Unknown
LICENSE-MODEL

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%