@valerio-oai @0hq @cocohearts @openai/parameter-golf-team Hi! I've submitted three related record PRs applying systems-level performance optimizations (fused Muon kernel, batched EMA, loader prealloc) to different base stacks, each improving val_bpb over the respective baseline:
They're submitted against multiple bases so a ready-to-merge option exists regardless of how the pending PRs are resolved.
This is my first submission to the challenge, so I apologize in advance if I've gotten anything wrong — I've done my best to follow the submission guidelines. Happy to address any issues. Thank you for taking the time to review these, I know you're busy!
@valerio-oai @0hq @cocohearts @openai/parameter-golf-team Hi! I've submitted three related record PRs applying systems-level performance optimizations (fused Muon kernel, batched EMA, loader prealloc) to different base stacks, each improving val_bpb over the respective baseline:
They're submitted against multiple bases so a ready-to-merge option exists regardless of how the pending PRs are resolved.
This is my first submission to the challenge, so I apologize in advance if I've gotten anything wrong — I've done my best to follow the submission guidelines. Happy to address any issues. Thank you for taking the time to review these, I know you're busy!