fix off-by-one in steady-state MFU calculation by Khadka-Bishal · Pull Request #93 · karpathy/autoresearch

Khadka-Bishal · 2026-03-09T19:33:34Z

steady_state_mfu was counting one more step than total_training_time actually timed. The timing gate uses if step > 10 before step is incremented, so steps 0..10 are excluded and the timed step count is step - 11, not step - 10. This changes only the reported MFU and does not affect training.

The timing gate `if step > 10` excludes steps 0-10 (11 iterations) from total_training_time. The MFU formula used (step - 10) as the timed step count, overcounting by one. Fix to (step - 11). This is a reporting-only change: no effect on training behavior, schedules, or the time budget break condition.

Fixes adopted from karpathy/autoresearch PRs: - karpathy#84: NaN loss bypasses fast-fail (IEEE 754: NaN > 100 is False). Fix: `not x <= 100`. Applied to both train.py and train_mlx.py. - karpathy#83: ParquetFile handles never closed, causing FD exhaustion on multi-epoch training. Fix: try/finally with pf.close(). - karpathy#107: Save pre-eval checkpoint so eval OOM/crash doesn't lose the entire training run. Removed on successful eval. - karpathy#93: MFU off-by-one: warmup skips 11 steps (0-10), not 10. - karpathy#70: Loss only reported last microstep, not average across grad accumulation. Fix: accumulate loss += detach() / grad_accum_steps. - karpathy#53: Debug checkpoint on loss explosion with step/loss metadata for post-mortem analysis (train.py only, merged into karpathy#84 fix). - karpathy#62: Input validation for --num-shards and --download-workers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix off-by-one in steady-state MFU calculation#93

fix off-by-one in steady-state MFU calculation#93
Khadka-Bishal wants to merge 1 commit intokarpathy:masterfrom
Khadka-Bishal:master

Khadka-Bishal commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Khadka-Bishal commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant