Skip to content

Non-record submission: HELIX and HELIX MoR K7R2 U-Net (architecture report + finalized metadata)#1600

Open
sayujshah wants to merge 25 commits intoopenai:mainfrom
sayujshah:main
Open

Non-record submission: HELIX and HELIX MoR K7R2 U-Net (architecture report + finalized metadata)#1600
sayujshah wants to merge 25 commits intoopenai:mainfrom
sayujshah:main

Conversation

@sayujshah
Copy link
Copy Markdown

This PR submits two non-record entries in records/track_non_record_16mb with full documentation, finalized metadata, and reproducible artifacts/logs.

Submission type

  • Track: Non-record submissions (16MB artifact cap)
  • Scope: Adds/updates only record folders and metadata for submission review

Included submissions

  1. 2026-03-25_HELIX

    • HELIX architecture with D-TPA attention + recurrence-style virtual depth
    • Full README with architecture breakdown, rationale, run config, results table, and improvement path
    • Finalized submission.json with concrete metrics/bytes fields
  2. 2026-03-31_HELIX_MoR_K7R2_UNet

    • MoR K7R2 U-Net variant on top of a strong SOTA-style base stack
    • 3-seed result summary with canonical best-seed metadata
    • Finalized submission.json with concrete metrics/bytes fields

Key results (as submitted)

  • HELIX (2026-03-25):

    • post-EMA val_bpb: 1.2781
    • total submission size: 9,973,239 bytes
  • HELIX MoR K7R2 U-Net (2026-03-31):

    • best-seed final int6 sliding val_bpb: 1.3663
    • best legal TTT val_bpb: 1.3105
    • total submission size (best seed): 7,274,404 bytes

Why these submissions are valuable

These runs are submitted as non-record research contributions with strong architectural signal:

  • novel/high-potential architecture choices under strict artifact constraints,
  • complete documentation of what was tried and why,
  • explicit discussion of bottlenecks (runtime/throughput + wallclock cutoff) and concrete next optimization targets for future SOTA attempts.

Compliance and required artifacts

Each submission folder includes the standard required assets:

  • README.md (detailed approach + results),
  • submission.json (author/score/bytes metadata),
  • train_gpt.py,
  • run artifacts/logs for reproducibility.

Both submissions remain under the 16,000,000-byte artifact cap as documented in README and submission.json.

sayujshah and others added 25 commits March 25, 2026 10:24
Novel architecture combining D-TPA (Differential Tensor Product Attention),
Mixture of Recursions (MoR), and Peri-LN (sandwich norm) for parameter-golf.
Targets BPB 1.097-1.107 vs SOTA 1.1194.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Fix D-TPA param count: 399K (consistent across sections), d=768
- Fix RoPE: applied to reconstructed Q/K per-forward-pass, not static basis
- Fix XSA: uses clean Q1/K1 path, not raw differential map
- Specify int6+zstd-22 source from SOTA submission
- Add mor_gate to CONTROL_TENSOR_NAME_PATTERNS env var
- Set fullgraph=False for torch.compile (MoR loop incompatible)
- Fix _init_weights to use named_parameters() for 3D TPA basis
- Switch to d=768 with SwiGLU(hidden=1536), isoparametric to relu²(3x)
- Total: ~20.91M params, ~15.18MB artifact

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Fuse 2 FA3 calls → 1 per block (concat 16Q/8KV heads, halves attn compute)
- Replace SwiGLU (3 matmuls) → LeakyReLU(0.5)² (2 matmuls), isoparametric at FFN_HIDDEN=2304
- Remove Peri-LN post-norms (4 RMSNorm/block → 2)
- Replace 5D unsqueeze einsum with torch.einsum for TPA reconstruction
- NUM_ITERATIONS 3→2 (10 virtual layers instead of 15, -33% compute/step)
- FFN_HIDDEN 1536→2304 (isoparametric relu²)
- WARMDOWN_ITERS 3500→2000 (was causing LR decay from step 1 at 261ms/step)

Expected: ~100-130ms/step → 4600-6000 steps in 600s vs 2298 before.
LR warmdown bug fix alone should substantially improve convergence.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant