Non-Record v2: 7L UNet + Int8 QAT + EMA + Long Train — 1.3969 BPB (DGX Spark)#1606
Open
AlirezaAlampour wants to merge 1 commit intoopenai:mainfrom
Open
Non-Record v2: 7L UNet + Int8 QAT + EMA + Long Train — 1.3969 BPB (DGX Spark)#1606AlirezaAlampour wants to merge 1 commit intoopenai:mainfrom
AlirezaAlampour wants to merge 1 commit intoopenai:mainfrom
Conversation
3-seed validation on DGX Spark (GB10 Blackwell, single GPU): seed 1337: val_bpb 1.39982649 (step 1033) seed 42: val_bpb 1.39564112 (step 1041) seed 314: val_bpb 1.39532841 (step 1040) Mean 1.39693 BPB, range 0.00450 BPB across seeds. 0.27 BPB improvement over 2026-04-07 v1 submission (1.6656 -> 1.3969) via deeper 7-layer model, 4x MLP multiplier, and 4-hour training budget (vs ~80 min in v1). Artifact mean 15.55 MB (under 16 MB cap). Same U-Net + Muon + EMA + LeakyReLU^2 + int8 QAT recipe as v1, retuned hyperparameters. Co-Authored-By: Claude Opus 4.6 <[email protected]>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Non-Record v2: 7-Layer UNet + Int8 QAT + EMA + 4-Hour Training
Track:
non_record_16mbHardware: NVIDIA DGX Spark (1× GB10 Blackwell, 128GB unified memory)
Artifact size: ~15.5 MB (int8+zlib)
Improvement over v1 (PR #1486): -0.27 BPB (1.6656 → 1.3969)
Results
Seed range: 0.0045 BPB (~0.32% of mean) — very stable configuration.
What changed from v1
My v1 submission (PR #1486) hit 1.6656 BPB at only ~320 steps due to a
restrictive 80-minute wallclock. v2 addresses the main bottleneck —
training length — and tunes the configuration for the ~1000-step regime
the Spark can reach in 4 hours.
Config changes:
The configuration was found through Optuna hyperparameter sweeps and
cross-validated on the DGX Spark's step-time characteristics (~13.8s/step).
Techniques
What's still interesting
Still developed from zero ML background using AI-assisted coding across
Claude, GPT, and Gemini. All training on a single consumer Blackwell GB10
GPU (DGX Spark) — no H100 access. The Spark runs at ~13.8s/step vs ~87ms
on 8×H100, limiting us to ~1040 steps in 4 hours.
The 0.27 BPB improvement came from two insights: (1) the previous
submission was step-starved, not config-bad, and (2) 7 wider layers
outperform 9 narrower layers at low step counts on this hardware.