Skip to content

Add MoE routing community environment#410

Open
signabuilder wants to merge 2 commits intoNousResearch:mainfrom
signabuilder:community/moe-routing
Open

Add MoE routing community environment#410
signabuilder wants to merge 2 commits intoNousResearch:mainfrom
signabuilder:community/moe-routing

Conversation

@signabuilder
Copy link

MoE Routing Environment

Trains a language model to act as a gating network for a heterogeneous Mixture-of-Experts inference mesh.

What it does

  • 7 frozen experts at different scales (0.8B → 35B) — the model learns which one handles which query type
  • 120 routing scenarios (8 query templates × 15 topics) covering triage, synthesis, validation, execution, simulation, classification, and research
  • 3-component reward: ideal match (Jaccard similarity, 50%), capability alignment (30%), cost efficiency (10%)
  • Expert selection via JSON output — model responds with ["a0", "v0"]

Why it matters

Standard MoE trains experts and gates jointly. This environment trains only the routing policy over frozen heterogeneous experts — making MoE practical on consumer hardware where
joint training is impossible. The same routing pattern applies to any domain with multiple specialized models and a controller that must learn which to invoke.

Architecture

Expert Model Scale Role
g0 Hermes 0.8B 0.5 GB Triage
g1 Hermes 2B 1.2 GB Classification
a0 Hermes 9B 5.5 GB Synthesis
a1 Hermes 9B 5.5 GB Adversarial challenge
v0 Hermes 27B 15 GB Validation
b0 Hermes 35B 20 GB Execution
q0 Hermes 9B 5.5 GB Quorum simulation

Quick start

python moe_routing_env.py serve --port 8332
python moe_routing_env.py process --num_trajectories 100

Research applications

- Heterogeneous MoE with post-hoc routing
- LM-as-router via RL
- Cost-aware expert selection
- Federated routing policy optimization via Psyche/DisTrO

---

Set base branch to `main`, source branch `community/moe-routing`.

Trains a language model to act as a gating network for heterogeneous Mixture-of-Experts inference. 7 frozen experts (0.8B-35B), 120 query scenarios, 3-component reward (ideal match + capability + cost).

<!--
╭───────────────────────────────────────────────────────────╮
│  ✨  ATROPOS PULL REQUEST TEMPLATE  ✨                    │
│  Select PR type below and fill applicable sections.       │
│  Delete non-applicable sections for your PR type.         │
╰───────────────────────────────────────────────────────────╯
-->

## PR Type
<!-- Please check ONE of the following options -->
- [ ] RL Environment PR - Complete Environment Snapshot & Zero-Training sections
- [ ] Non-Environment PR - Complete Description, Related Issues & Type of Change sections

---

## 📝 General Information
### Description
<!-- Briefly describe the changes or additions introduced by this pull request. -->

<!-- For non-environment PRs -->
### Related Issues
<!-- Link any relevant issues here. Use "Closes #issue_number" to automatically close issues. -->

### Type of Change
<!-- For non-environment PRs - delete options that are not relevant. -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)
- [ ] This change requires a documentation update
- [ ] Code refactor (no functional changes)
- [ ] Build/CI/CD related changes
- [ ] Other (please describe):

---

## 🔖 Environment Snapshot
<!-- For RL Environment PRs only -->
| Field | Your Entry |
|-------|------------|
| **Environment Name** | <!-- e.g. "SudokuVerifier-v0" --> |
| **Short Description** | <!-- One-sentence purpose/goal. --> |
| **Category** | <!-- Select: Verifiable-Reasoning / RLAIF / RLHF / Other  --> |
| **Dataset Needed?** | <!-- No / Yes (link & license) --> |
| **External Deps** | <!-- Extra pip packages, system libs, etc. --> |
| **Environmental Variables** | <!-- variable name(s) --> |
| **Compute Footprint Estimate** | <!-- "<1 GB RAM, <1 min CPU verification" or similar --> |

## 🧪 Zero-Training Test Results
<!-- For RL Environment PRs only -->
<details>

**W&B Link:**

**Examples of the Environment scoring a good example and a bad example:**

</details>

---

## ✅ Developer & Reviewer Checklist
<!-- Common checklist for all PR types - adapt as needed for your PR type -->
- [ ] Code follows project style (black, isort, flake8 pass with pre-commit)
- [ ] I have performed a self-review of my own code
- [ ] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [ ] My changes generate no new warnings
- [ ] New and existing unit tests pass locally with my changes
- [ ] Docstrings added for all new public classes / functions
- [ ] If .env vars required, did you add it to the .env.example in repo root?

Thomas Perry added 2 commits March 11, 2026 10:21
Trains a language model to act as a gating network for heterogeneous
Mixture-of-Experts inference. 7 frozen experts (0.8B-35B), 120 query
scenarios, 3-component reward (ideal match + capability + cost).
- Remove unused `Any` import (flake8 F401)
- Add `pragma: allowlist secret` for detect-secrets false positive on api_key="local"
- Remove dead Chameleon concept paper link from README
@signabuilder signabuilder force-pushed the community/moe-routing branch from fdbf99d to af7b2a9 Compare March 12, 2026 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant