Skip to content

Add optional checkpoint import/export hooks#151

Open
corysmart wants to merge 1 commit intokarpathy:masterfrom
corysmart:codex/upstream-checkpoint-hooks
Open

Add optional checkpoint import/export hooks#151
corysmart wants to merge 1 commit intokarpathy:masterfrom
corysmart:codex/upstream-checkpoint-hooks

Conversation

@corysmart
Copy link

Summary

This adds optional checkpoint load/save hooks to train.py without changing default behavior.

New behavior is only enabled when these env vars are set:

  • AUTORESEARCH_LOAD_CHECKPOINT
  • AUTORESEARCH_SAVE_CHECKPOINT

What changed

  • add maybe_load_checkpoint() to load a checkpoint before training
  • allow partial state-dict restore when tensor shapes still match
  • add maybe_save_checkpoint() to persist model state and a small amount of run metadata after evaluation
  • keep the compiled training path unchanged by loading/saving around the existing model_core setup

Why

This creates a minimal continuation/export seam that is useful for:

  • resuming or bootstrapping experiments
  • external harnesses that want to persist model state between short runs
  • downstream tooling that needs checkpoint artifacts without restructuring the trainer

Notes

  • default behavior is unchanged when the env vars are unset
  • checkpoint loading is permissive by shape, so compatible subsets can still be restored
  • no new dependencies were added

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants