Skip to content

fix off-by-one in steady-state MFU calculation#93

Open
Khadka-Bishal wants to merge 1 commit intokarpathy:masterfrom
Khadka-Bishal:master
Open

fix off-by-one in steady-state MFU calculation#93
Khadka-Bishal wants to merge 1 commit intokarpathy:masterfrom
Khadka-Bishal:master

Conversation

@Khadka-Bishal
Copy link

steady_state_mfu was counting one more step than total_training_time actually timed. The timing gate uses if step > 10 before step is incremented, so steps 0..10 are excluded and the timed step count is step - 11, not step - 10. This changes only the reported MFU and does not affect training.

The timing gate `if step > 10` excludes steps 0-10 (11 iterations)
from total_training_time. The MFU formula used (step - 10) as the
timed step count, overcounting by one. Fix to (step - 11).

This is a reporting-only change: no effect on training behavior,
schedules, or the time budget break condition.
sunnypatneedi added a commit to sunnypatneedi/autoresearch-muon that referenced this pull request Mar 10, 2026
Fixes adopted from karpathy/autoresearch PRs:

- karpathy#84: NaN loss bypasses fast-fail (IEEE 754: NaN > 100 is False).
  Fix: `not x <= 100`. Applied to both train.py and train_mlx.py.
- karpathy#83: ParquetFile handles never closed, causing FD exhaustion on
  multi-epoch training. Fix: try/finally with pf.close().
- karpathy#107: Save pre-eval checkpoint so eval OOM/crash doesn't lose
  the entire training run. Removed on successful eval.
- karpathy#93: MFU off-by-one: warmup skips 11 steps (0-10), not 10.
- karpathy#70: Loss only reported last microstep, not average across grad
  accumulation. Fix: accumulate loss += detach() / grad_accum_steps.
- karpathy#53: Debug checkpoint on loss explosion with step/loss metadata
  for post-mortem analysis (train.py only, merged into karpathy#84 fix).
- karpathy#62: Input validation for --num-shards and --download-workers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
sunnypatneedi added a commit to sunnypatneedi/autoresearch-muon that referenced this pull request Mar 10, 2026
Fixes adopted from karpathy/autoresearch PRs:

- karpathy#84: NaN loss bypasses fast-fail (IEEE 754: NaN > 100 is False).
  Fix: `not x <= 100`. Applied to both train.py and train_mlx.py.
- karpathy#83: ParquetFile handles never closed, causing FD exhaustion on
  multi-epoch training. Fix: try/finally with pf.close().
- karpathy#107: Save pre-eval checkpoint so eval OOM/crash doesn't lose
  the entire training run. Removed on successful eval.
- karpathy#93: MFU off-by-one: warmup skips 11 steps (0-10), not 10.
- karpathy#70: Loss only reported last microstep, not average across grad
  accumulation. Fix: accumulate loss += detach() / grad_accum_steps.
- karpathy#53: Debug checkpoint on loss explosion with step/loss metadata
  for post-mortem analysis (train.py only, merged into karpathy#84 fix).
- karpathy#62: Input validation for --num-shards and --download-workers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant