[Non-Record] Masked Text Diffusion (SP1024) - Unlimited Compute Track by Heron4gf · Pull Request #1596 · openai/parameter-golf

Heron4gf · 2026-04-13T12:39:53Z

Summary

This is a non-record, unlimited-compute submission exploring Masked Text Diffusion under the 16MB artifact constraint.

The repo's README explicitly listed "Text Diffusion" as a requested direction for non-record submissions, so this serves as a proof-of-concept that a discrete diffusion objective can be successfully trained and quantized within the competition's framework.

Training Details

Hardware: 8xH100
Duration: ~20 minutes (11,597 steps)
Objective: Bidirectional Masked Token Reconstruction. A corruption rate t ~ Uniform(0, 1) is sampled per sequence, and tokens are replaced with a [MASK] token based on t.
Compression: Maintained the baseline INT6 GPTQ + LZMA pipeline. The model slightly overshot the 16MB limit, but the script successfully selectively pruned ~810k +/- 1 values to hit exactly 15.9MB.

⚠️ Important Evaluation Note

Because this is a bidirectional diffusion model (causal=False in Flash Attention), it cannot be evaluated using standard left-to-right causal BPB. To evaluate it, I replaced the validation loop with an integrated Masked Reconstruction Score. The model is evaluated at fixed noise levels ($t \in [0.1, 0.3, 0.5, 0.7, 0.9]$), and the NLL is weighted by $1/t$ to approximate the ELBO.

Therefore, the val_bpb of 2.99 is a Diffusion Reconstruction Score. It is mathematically expected to be numerically higher than standard causal BPB and should not be directly compared to the AR models on the main leaderboard.

Artifacts Included

train_gpt.py (Custom training and $1/t$ evaluation logic)
final_model.int6.ptz (15.89 MiB / 16,671,343 bytes total submission size)
submission.json
Local README.md

Heron4gf added 26 commits April 3, 2026 15:55

applied EMA weights

b6bac27

11 layers, MLP 3x + LeakyReLU(0.5)^2 + BigramHash embedding

e7e2489

fixed bigram

d6614a9

set QAT threshold to 0

c3a5d66

fixed bigram

fd0471f

reverted to fix bigram

bca44e6

fixed block forward

c1bba02

removed residmix

f13e960

fixed leaky relu

3430b12

fixed leaky relu (hopefully)

6db81ae

changed to 20 warmup steps

198b8ee

update to use best

ab153b9

Merge branch 'openai:main' into main

05434b5

created my submission folder for text diffusion

4430271

updated to fix

11dbfb4

should have fixed text diffusion now

8d489e1

fixed typo

068847b

fixed dynamo backend execution graphs tracing

3a74b38

fixed the vocab size

094fa65

fixed lr_mul and corrupt to stabilize loss

94c5864

removed fake warmup

6f7012f

updated to use torch no grad

c47dedb

fixed SWA and QAT

dd429ff

qat and swa activate based on time

72c8d9b

changed hyperparams to do 20m run

8aac809

submission

a58d250

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Non-Record] Masked Text Diffusion (SP1024) - Unlimited Compute Track#1596

[Non-Record] Masked Text Diffusion (SP1024) - Unlimited Compute Track#1596
Heron4gf wants to merge 26 commits intoopenai:mainfrom
Heron4gf:main

Heron4gf commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Heron4gf commented Apr 13, 2026

Summary

Training Details

⚠️ Important Evaluation Note

Artifacts Included

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant