Skip to content

Harden cache artifact trust boundary in prepare.py#145

Open
Xingkai98 wants to merge 1 commit intokarpathy:masterfrom
Xingkai98:harden-cache-integrity
Open

Harden cache artifact trust boundary in prepare.py#145
Xingkai98 wants to merge 1 commit intokarpathy:masterfrom
Xingkai98:harden-cache-integrity

Conversation

@Xingkai98
Copy link

Fixes #41

  • Add SHA-256 hash verification for tokenizer.pkl and token_bytes.pt
  • Store hashes in metadata.json for integrity checking
  • Use torch.load with weights_only=True for safer loading
  • Raise clear error messages when integrity check fails
  • Suggest re-running prepare.py if cache is corrupted

This prevents loading tampered or corrupted cache artifacts on shared machines or copied cache directories.

Fixes karpathy#41

- Add SHA-256 hash verification for tokenizer.pkl and token_bytes.pt
- Store hashes in metadata.json for integrity checking
- Use torch.load with weights_only=True for safer loading
- Raise clear error messages when integrity check fails
- Suggest re-running prepare.py if cache is corrupted

This prevents loading tampered or corrupted cache artifacts on shared
machines or copied cache directories.
IgorTavcar added a commit to IgorTavcar/autoresearch that referenced this pull request Mar 11, 2026
Add SHA-256 hash verification for tokenizer.pkl and token_bytes.pt,
and use weights_only=True for safer torch.load.

Cherry-picked from karpathy#145

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Harden cache artifact trust boundary in prepare.py

1 participant