chore: update deepspeed.py (
#30 )
Pull request merge
lkevinzcpushed 1 commit to main • fa662b1…4540740 • 9 days ago
lkevinzcpushed 1 commit to main • 56b9b57…fa662b1 • 11 days ago
Use a toy task to test R1-zero like training behaviors (
#28 )
Pull request merge
lkevinzcpushed 1 commit to main • d000304…56b9b57 • 15 days ago
use count down to test r1-zero like training behavior
minor fix for offline sft (
#27 )
Pull request merge
lkevinzcpushed 1 commit to main • f778278…d000304 • 18 days ago
minor fix for offline sft
lkevinzcpushed 1 commit to main • b966394…f778278 • 21 days ago
Refactor and add PPO for math reasoning (
#25 )
Pull request merge
lkevinzcpushed 1 commit to main • 37becae…b966394 • 21 days ago
implement length-regularized DPO (
#24 )
Pull request merge
lkevinzcpushed 1 commit to main • ed7d3b1…37becae • on Dec 21, 2024
implement length-regularized DPO
lkevinzcpushed 1 commit to main • c638ce4…ed7d3b1 • on Dec 17, 2024
You can’t perform that action at this time.