Skip to content

[Non-Record] Masked Text Diffusion (SP1024) - Unlimited Compute Track#1596

Open
Heron4gf wants to merge 26 commits intoopenai:mainfrom
Heron4gf:main
Open

[Non-Record] Masked Text Diffusion (SP1024) - Unlimited Compute Track#1596
Heron4gf wants to merge 26 commits intoopenai:mainfrom
Heron4gf:main

Conversation

@Heron4gf
Copy link
Copy Markdown

Summary

This is a non-record, unlimited-compute submission exploring Masked Text Diffusion under the 16MB artifact constraint.

The repo's README explicitly listed "Text Diffusion" as a requested direction for non-record submissions, so this serves as a proof-of-concept that a discrete diffusion objective can be successfully trained and quantized within the competition's framework.

Training Details

  • Hardware: 8xH100
  • Duration: ~20 minutes (11,597 steps)
  • Objective: Bidirectional Masked Token Reconstruction. A corruption rate t ~ Uniform(0, 1) is sampled per sequence, and tokens are replaced with a [MASK] token based on t.
  • Compression: Maintained the baseline INT6 GPTQ + LZMA pipeline. The model slightly overshot the 16MB limit, but the script successfully selectively pruned ~810k +/- 1 values to hit exactly 15.9MB.

⚠️ Important Evaluation Note

Because this is a bidirectional diffusion model (causal=False in Flash Attention), it cannot be evaluated using standard left-to-right causal BPB. To evaluate it, I replaced the validation loop with an integrated Masked Reconstruction Score. The model is evaluated at fixed noise levels ($t \in [0.1, 0.3, 0.5, 0.7, 0.9]$), and the NLL is weighted by $1/t$ to approximate the ELBO.

Therefore, the val_bpb of 2.99 is a Diffusion Reconstruction Score. It is mathematically expected to be numerically higher than standard causal BPB and should not be directly compared to the AR models on the main leaderboard.

Artifacts Included

  • train_gpt.py (Custom training and $1/t$ evaluation logic)
  • final_model.int6.ptz (15.89 MiB / 16,671,343 bytes total submission size)
  • submission.json
  • Local README.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant