Add MoE routing community environment by signabuilder · Pull Request #410 · NousResearch/atropos

signabuilder · 2026-03-11T15:42:21Z

MoE Routing Environment

Trains a language model to act as a gating network for a heterogeneous Mixture-of-Experts inference mesh.

What it does

7 frozen experts at different scales (0.8B → 35B) — the model learns which one handles which query type
120 routing scenarios (8 query templates × 15 topics) covering triage, synthesis, validation, execution, simulation, classification, and research
3-component reward: ideal match (Jaccard similarity, 50%), capability alignment (30%), cost efficiency (10%)
Expert selection via JSON output — model responds with ["a0", "v0"]

Why it matters

Standard MoE trains experts and gates jointly. This environment trains only the routing policy over frozen heterogeneous experts — making MoE practical on consumer hardware where
joint training is impossible. The same routing pattern applies to any domain with multiple specialized models and a controller that must learn which to invoke.

Architecture

Expert	Model	Scale	Role
g0	Hermes 0.8B	0.5 GB	Triage
g1	Hermes 2B	1.2 GB	Classification
a0	Hermes 9B	5.5 GB	Synthesis
a1	Hermes 9B	5.5 GB	Adversarial challenge
v0	Hermes 27B	15 GB	Validation
b0	Hermes 35B	20 GB	Execution
q0	Hermes 9B	5.5 GB	Quorum simulation

Quick start

python moe_routing_env.py serve --port 8332
python moe_routing_env.py process --num_trajectories 100

Research applications

- Heterogeneous MoE with post-hoc routing
- LM-as-router via RL
- Cost-aware expert selection
- Federated routing policy optimization via Psyche/DisTrO

---

Set base branch to `main`, source branch `community/moe-routing`.

Trains a language model to act as a gating network for heterogeneous Mixture-of-Experts inference. 7 frozen experts (0.8B-35B), 120 query scenarios, 3-component reward (ideal match + capability + cost).

<!--
╭───────────────────────────────────────────────────────────╮
│  ✨  ATROPOS PULL REQUEST TEMPLATE  ✨                    │
│  Select PR type below and fill applicable sections.       │
│  Delete non-applicable sections for your PR type.         │
╰───────────────────────────────────────────────────────────╯
-->

## PR Type
<!-- Please check ONE of the following options -->
- [ ] RL Environment PR - Complete Environment Snapshot & Zero-Training sections
- [ ] Non-Environment PR - Complete Description, Related Issues & Type of Change sections

---

## 📝 General Information
### Description
<!-- Briefly describe the changes or additions introduced by this pull request. -->

<!-- For non-environment PRs -->
### Related Issues
<!-- Link any relevant issues here. Use "Closes #issue_number" to automatically close issues. -->

### Type of Change
<!-- For non-environment PRs - delete options that are not relevant. -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
- [ ] Code refactor (no functional changes)
- [ ] Build/CI/CD related changes
- [ ] Other (please describe):

---

## 🔖 Environment Snapshot
<!-- For RL Environment PRs only -->
| Field | Your Entry |
|-------|------------|
| **Environment Name** | <!-- e.g. "SudokuVerifier-v0" --> |
| **Short Description** | <!-- One-sentence purpose/goal. --> |
| **Category** | <!-- Select: Verifiable-Reasoning / RLAIF / RLHF / Other  --> |
| **Dataset Needed?** | <!-- No / Yes (link & license) --> |
| **External Deps** | <!-- Extra pip packages, system libs, etc. --> |
| **Environmental Variables** | <!-- variable name(s) --> |
| **Compute Footprint Estimate** | <!-- "<1 GB RAM, <1 min CPU verification" or similar --> |

## 🧪 Zero-Training Test Results
<!-- For RL Environment PRs only -->
<details>

**W&B Link:**

**Examples of the Environment scoring a good example and a bad example:**

</details>

---

## ✅ Developer & Reviewer Checklist
<!-- Common checklist for all PR types - adapt as needed for your PR type -->
- [ ] Code follows project style (black, isort, flake8 pass with pre-commit)
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] New and existing unit tests pass locally with my changes
- [ ] Docstrings added for all new public classes / functions
- [ ] If .env vars required, did you add it to the .env.example in repo root?

Trains a language model to act as a gating network for heterogeneous Mixture-of-Experts inference. 7 frozen experts (0.8B-35B), 120 query scenarios, 3-component reward (ideal match + capability + cost).

- Remove unused `Any` import (flake8 F401) - Add `pragma: allowlist secret` for detect-secrets false positive on api_key="local" - Remove dead Chameleon concept paper link from README

Thomas Perry added 2 commits March 11, 2026 10:21

Add MoE routing community environment

b8da0dd

Trains a language model to act as a gating network for heterogeneous Mixture-of-Experts inference. 7 frozen experts (0.8B-35B), 120 query scenarios, 3-component reward (ideal match + capability + cost).

fix: resolve pre-commit CI failures in moe_routing environment

af7b2a9

- Remove unused `Any` import (flake8 F401) - Add `pragma: allowlist secret` for detect-secrets false positive on api_key="local" - Remove dead Chameleon concept paper link from README

signabuilder force-pushed the community/moe-routing branch from fdbf99d to af7b2a9 Compare March 12, 2026 17:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MoE routing community environment#410

Add MoE routing community environment#410
signabuilder wants to merge 2 commits intoNousResearch:mainfrom
signabuilder:community/moe-routing

signabuilder commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

signabuilder commented Mar 11, 2026

MoE Routing Environment

What it does

Why it matters

Architecture

Quick start

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant