Non-record submission: HELIX and HELIX MoR K7R2 U-Net (architecture report + finalized metadata) by sayujshah · Pull Request #1600 · openai/parameter-golf

sayujshah · 2026-04-13T17:36:18Z

This PR submits two non-record entries in records/track_non_record_16mb with full documentation, finalized metadata, and reproducible artifacts/logs.

Submission type

Track: Non-record submissions (16MB artifact cap)
Scope: Adds/updates only record folders and metadata for submission review

Included submissions

2026-03-25_HELIX
- HELIX architecture with D-TPA attention + recurrence-style virtual depth
- Full README with architecture breakdown, rationale, run config, results table, and improvement path
- Finalized submission.json with concrete metrics/bytes fields
2026-03-31_HELIX_MoR_K7R2_UNet
- MoR K7R2 U-Net variant on top of a strong SOTA-style base stack
- 3-seed result summary with canonical best-seed metadata
- Finalized submission.json with concrete metrics/bytes fields

Key results (as submitted)

HELIX (2026-03-25):
- post-EMA val_bpb: 1.2781
- total submission size: 9,973,239 bytes
HELIX MoR K7R2 U-Net (2026-03-31):
- best-seed final int6 sliding val_bpb: 1.3663
- best legal TTT val_bpb: 1.3105
- total submission size (best seed): 7,274,404 bytes

Why these submissions are valuable

These runs are submitted as non-record research contributions with strong architectural signal:

novel/high-potential architecture choices under strict artifact constraints,
complete documentation of what was tried and why,
explicit discussion of bottlenecks (runtime/throughput + wallclock cutoff) and concrete next optimization targets for future SOTA attempts.

Compliance and required artifacts

Each submission folder includes the standard required assets:

README.md (detailed approach + results),
submission.json (author/score/bytes metadata),
train_gpt.py,
run artifacts/logs for reproducibility.

Both submissions remain under the 16,000,000-byte artifact cap as documented in README and submission.json.

Novel architecture combining D-TPA (Differential Tensor Product Attention), Mixture of Recursions (MoR), and Peri-LN (sandwich norm) for parameter-golf. Targets BPB 1.097-1.107 vs SOTA 1.1194. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

- Fix D-TPA param count: 399K (consistent across sections), d=768 - Fix RoPE: applied to reconstructed Q/K per-forward-pass, not static basis - Fix XSA: uses clean Q1/K1 path, not raw differential map - Specify int6+zstd-22 source from SOTA submission - Add mor_gate to CONTROL_TENSOR_NAME_PATTERNS env var - Set fullgraph=False for torch.compile (MoR loop incompatible) - Fix _init_weights to use named_parameters() for 3D TPA basis - Switch to d=768 with SwiGLU(hidden=1536), isoparametric to relu²(3x) - Total: ~20.91M params, ~15.18MB artifact Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

…ing + MoR lb_weight decay

- Fuse 2 FA3 calls → 1 per block (concat 16Q/8KV heads, halves attn compute) - Replace SwiGLU (3 matmuls) → LeakyReLU(0.5)² (2 matmuls), isoparametric at FFN_HIDDEN=2304 - Remove Peri-LN post-norms (4 RMSNorm/block → 2) - Replace 5D unsqueeze einsum with torch.einsum for TPA reconstruction - NUM_ITERATIONS 3→2 (10 virtual layers instead of 15, -33% compute/step) - FFN_HIDDEN 1536→2304 (isoparametric relu²) - WARMDOWN_ITERS 3500→2000 (was causing LR decay from step 1 at 261ms/step) Expected: ~100-130ms/step → 4600-6000 steps in 600s vs 2298 before. LR warmdown bug fix alone should substantially improve convergence. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>

tracing error

backend error on non-H100

sayujshah and others added 25 commits March 25, 2026 10:24

feat: bootstrap HELIX submission from SOTA base

7abccda

feat(helix): add HELIX hyperparameters and mor_gate control pattern

3fa1a07

feat(helix): add SwiGLU FFN (isoparametric with relu² at hidden=2d)

a0d85de

feat(helix): add DTPA (Differential Tensor Product Attention)

94bc055

feat(helix): add HELIXBlock with DTPA + SwiGLU + Peri-LN

6fcb4a7

feat(helix): implement HELIX_GPT with 5 blocks × 3 MoR iterations

3becfdf

feat(helix): update main() for HELIX instantiation and optimizer rout…

5c69ce8

…ing + MoR lb_weight decay

feat(helix): update serialization and eval for HELIX_GPT (no banks)

035862e

feat(helix): add test_serialize and test_smoke for Tasks 8-9

bc21a31

Initial model developed

8e352e2

Merge branch 'main' of https://github.com/sayujshah/parameter-golf

86d4179

Update training script

8b9a091

Cast the basis tensors to the input dtype before the einsum

43aa362

fix: explicit bfloat16 cast before FA3 calls

1a991fa

Finished model training

abd74ef

Built HELIX v2

c55d484

Google Colab compatibility added

0fc59b9

fix: skip torch.compile on non-H100 to avoid SDPA fake-tensor

d89b321

tracing error

fix: force math SDP backend in SDPA fallback to avoid Invalid

96680cd

backend error on non-H100

Removed Google Colab compatibility

774eaa5

Update training script

91c806d

Prepped for submission

474dae2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-record submission: HELIX and HELIX MoR K7R2 U-Net (architecture report + finalized metadata)#1600

Non-record submission: HELIX and HELIX MoR K7R2 U-Net (architecture report + finalized metadata)#1600
sayujshah wants to merge 25 commits intoopenai:mainfrom
sayujshah:main

sayujshah commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sayujshah commented Apr 13, 2026

Submission type

Included submissions

Key results (as submitted)

Why these submissions are valuable

Compliance and required artifacts

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant