Non-record: Nemotron-H Mamba-3 Hybrid + First SSM Depth Recurrence (1.4765 BPB) by inin-zou · Pull Request #1607 · openai/parameter-golf

inin-zou · 2026-04-14T00:21:52Z

Summary

First Mamba depth recurrence in the competition (checks off "State-space models" from Requests for PRs)
Nemotron-H inspired hybrid: 7 Mamba-3 SISO + 1 Attention (8 physical layers → 12 virtual via hinge-point recurrence)
Novel hinge-point multi-recurrence: layers 3,4 repeated 2x at U-Net hinge, outperforms spread recurrence
val_bpb: 1.4765 post-quant (1000 steps, 1xH100, GPTQ int6+LZMA, 8.2MB artifact)
Systematic ablation of 6 recurrence configs, 3 quantization strategies, and 3 architectural variants

Key Findings

Finding	Detail
Mamba depth recurrence works	-0.0092 bpb vs no recurrence (first-ever SSM recurrence result)
Focused > spread recurrence	Hinge ×2 (1.2824) beats 4-layer ×1 (1.2864) at same virtual depth
Ternary Mamba not viable at 26M	+0.397 bpb worse (literature confirms min ~1.3B needed)
Q-Mamba DSQ not needed	Standard Full Hessian GPTQ already handles SSM outliers (0.082 vs 0.148 quant loss)
RoPE removal hurts at small scale	+0.072 worse (unlike Jamba 1.3B where it's neutral)

Architecture

Physical: [Mamba3_0, Mamba3_1, Mamba3_2, Mamba3_3, Attn_4, Mamba3_5, Mamba3_6, Mamba3_7]
Virtual:  [M0, M1, M2, M3, A4, M3, A4, M3, A4, M5, M6, M7]  (12 layers, 0 extra params)

Credits

Built on PR #1355 (best SSM) pipeline. Inspired by NVIDIA Nemotron-H (arXiv 2504.03624), Mamba-3 (ICLR 2026), and PR #1204 (depth recurrence concept).

Test plan

Verify script runs: torchrun --standalone --nproc_per_node=1 train_nemotron_hybrid.py with env vars from README
Check artifact < 16MB (currently 8.2MB)
Pending: 8xH100 10-min run (awaiting OpenAI compute grant)

…(1.4765 BPB) First Mamba depth recurrence in Parameter Golf. 7 Mamba-3 + 1 Attention hybrid with hinge-point multi-recurrence (12 virtual layers from 8 physical, zero extra params). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record: Nemotron-H Mamba-3 Hybrid + First SSM Depth Recurrence (1.4765 BPB)#1607

Non-record: Nemotron-H Mamba-3 Hybrid + First SSM Depth Recurrence (1.4765 BPB)#1607
inin-zou wants to merge 1 commit intoopenai:mainfrom
inin-zou:submission/nemotron-h-mamba3-depth-recurrence

inin-zou commented Apr 14, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

inin-zou commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Findings

Architecture

Credits

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

inin-zou commented Apr 14, 2026 •

edited

Loading