Record: VarLen Attention + Fused MLP + Multi-Phase Global SGD TTT — val_bpb 1.07193 (3-seed mean) by dexhunter · Pull Request #1626 · openai/parameter-golf

dexhunter · 2026-04-14T23:52:27Z

Summary

val_bpb: 1.07193 (3-seed mean, std 0.00063) | 2.76890 nats | ~15.93 MB
Novel multi-phase global SGD during phased TTT evaluation — splits prefix docs into 3 phases with interleaved scoring and distributed SGD adaptation
Builds on PR Record: Varlen attention + fused MLP + doc-independent TTT (1.07336) #1530 (@samacqua) VarLen attention + fused MLP + doc-TTT, with phased TTT concept from PR Record: VarLenAttn + PhasingTTT - val_bpb 1.0728 (3-seed mean) #1610 (@romeerp)
Additional improvements: trimmed GPTQ, MATRIX_LR=0.026, per-layer adaptive clip, int7 embeddings, warmdown 0.75

Results

Seed	Post-TTT BPB	val_loss (nats)	Artifact
42	1.07280	2.77116	15,932,897
0	1.07134	2.76739	15,939,841
1234	1.07164	2.76815	15,932,419
Mean	1.07193	2.76890

Key Innovation

Multi-phase global SGD: instead of a single SGD round on prefix docs (PR #1610), we split into 3 phases — scoring a chunk, running SGD, then scoring the next chunk with the improved model. This progressively adapts the base model while maintaining strict score-before-update legality. 3-phase gives -0.0008 BPP over single-phase.

Test plan

Verify 3-seed mean and std
Check artifact sizes < 16 MB
Verify score-before-update ordering in TTT logs
Check code consistency across seeds

@samacqua

…al_bpb 1.07193 (3-seed mean) Novel multi-phase global SGD during phased TTT evaluation. Builds on PR openai#1530 (@samacqua) + PR openai#1610 (@romeerp) phased TTT concept. 3-seed mean: 1.07193 BPB (2.76890 nats), std 0.00063. Seeds: 42, 0, 1234. All artifacts <16 MB.

romeerp · 2026-04-15T00:01:18Z

Wanted to implement this multi-phased strategy but didn't have compute to run tests for it. Glad you were able to do it and show improvement!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Record: VarLen Attention + Fused MLP + Multi-Phase Global SGD TTT — val_bpb 1.07193 (3-seed mean)#1626

Record: VarLen Attention + Fused MLP + Multi-Phase Global SGD TTT — val_bpb 1.07193 (3-seed mean)#1626
dexhunter wants to merge 1 commit intoopenai:mainfrom
dexhunter:dexhunter/multiphase-sgd-ttt

dexhunter commented Apr 14, 2026

Uh oh!

romeerp commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dexhunter commented Apr 14, 2026

Summary

Results

Key Innovation

Test plan

Uh oh!

romeerp commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants