Add GRPO fine-tuning example with README#83
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
0232b27 to
7fc12cc
Compare
Signed-off-by: Fiona-Waters <fiwaters6@gmail.com>
7fc12cc to
d207981
Compare
|
@Fiona-Waters I would suggest interactive and distributed modes to be consistent with SFT, OSFT and Lora. Is it possible to add a "Test the Trained Model" section in the notebook similar to the other algorithm notebooks. Might be easier to add the MlFlow Interactive part as in this PR now, rather than have to open a follow on PR. Just for interactive example. |
2ec21ac to
84f69b2
Compare
- Add grpo_lora-interactive-notebook.ipynb for single-GPU GRPO training directly in the workbench - Include "Test the Trained Model" section with dynamic checkpoint loading - Update README to document both interactive and distributed execution modes - Update workbench requirements for interactive mode (8 CPU, 64Gi memory) - Remove custom reward function appendix from both notebooks (out of scope) Signed-off-by: Fiona-Waters <fiwaters6@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>
84f69b2 to
b24c8ae
Compare
Thanks @briangallagher I have:
|
Summary
What's included
Dependencies
TODO