Non-record submission: HELIX and HELIX MoR K7R2 U-Net (architecture report + finalized metadata)#1600
Open
sayujshah wants to merge 25 commits intoopenai:mainfrom
Open
Non-record submission: HELIX and HELIX MoR K7R2 U-Net (architecture report + finalized metadata)#1600sayujshah wants to merge 25 commits intoopenai:mainfrom
sayujshah wants to merge 25 commits intoopenai:mainfrom
Conversation
Novel architecture combining D-TPA (Differential Tensor Product Attention), Mixture of Recursions (MoR), and Peri-LN (sandwich norm) for parameter-golf. Targets BPB 1.097-1.107 vs SOTA 1.1194. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
- Fix D-TPA param count: 399K (consistent across sections), d=768 - Fix RoPE: applied to reconstructed Q/K per-forward-pass, not static basis - Fix XSA: uses clean Q1/K1 path, not raw differential map - Specify int6+zstd-22 source from SOTA submission - Add mor_gate to CONTROL_TENSOR_NAME_PATTERNS env var - Set fullgraph=False for torch.compile (MoR loop incompatible) - Fix _init_weights to use named_parameters() for 3D TPA basis - Switch to d=768 with SwiGLU(hidden=1536), isoparametric to relu²(3x) - Total: ~20.91M params, ~15.18MB artifact Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
…ing + MoR lb_weight decay
- Fuse 2 FA3 calls → 1 per block (concat 16Q/8KV heads, halves attn compute) - Replace SwiGLU (3 matmuls) → LeakyReLU(0.5)² (2 matmuls), isoparametric at FFN_HIDDEN=2304 - Remove Peri-LN post-norms (4 RMSNorm/block → 2) - Replace 5D unsqueeze einsum with torch.einsum for TPA reconstruction - NUM_ITERATIONS 3→2 (10 virtual layers instead of 15, -33% compute/step) - FFN_HIDDEN 1536→2304 (isoparametric relu²) - WARMDOWN_ITERS 3500→2000 (was causing LR decay from step 1 at 261ms/step) Expected: ~100-130ms/step → 4600-6000 steps in 600s vs 2298 before. LR warmdown bug fix alone should substantially improve convergence. Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
backend error on non-H100
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR submits two non-record entries in
records/track_non_record_16mbwith full documentation, finalized metadata, and reproducible artifacts/logs.Submission type
16MBartifact cap)Included submissions
2026-03-25_HELIXsubmission.jsonwith concrete metrics/bytes fields2026-03-31_HELIX_MoR_K7R2_UNetsubmission.jsonwith concrete metrics/bytes fieldsKey results (as submitted)
HELIX (2026-03-25):
val_bpb: 1.2781HELIX MoR K7R2 U-Net (2026-03-31):
val_bpb: 1.3663val_bpb: 1.3105Why these submissions are valuable
These runs are submitted as non-record research contributions with strong architectural signal:
Compliance and required artifacts
Each submission folder includes the standard required assets:
README.md(detailed approach + results),submission.json(author/score/bytes metadata),train_gpt.py,Both submissions remain under the 16,000,000-byte artifact cap as documented in README and
submission.json.