Skip to content

Add GRPO fine-tuning example with README#83

Draft
Fiona-Waters wants to merge 2 commits into
red-hat-data-services:mainfrom
Fiona-Waters:grpo_art_example
Draft

Add GRPO fine-tuning example with README#83
Fiona-Waters wants to merge 2 commits into
red-hat-data-services:mainfrom
Fiona-Waters:grpo_art_example

Conversation

@Fiona-Waters
Copy link
Copy Markdown

@Fiona-Waters Fiona-Waters commented May 18, 2026

Summary

  • Add comprehensive README for GRPO (Group Relative Policy Optimization) fine-tuning, following the pattern established by existing SFT, OSFT, and LoRA examples
  • Add GRPO to the parent fine-tuning examples overview
  • Example notebook (grpo_lora-kubeflow-trainjob.ipynb) demonstrates single-GPU GRPO training via Kubeflow SDK and Training Hub's ART backend

What's included

  • examples/fine-tuning/grpo/README.md — algorithm overview, hardware requirements, workbench setup guide, GRPO-specific considerations (dshm volume, gpu_memory_utilization)
  • examples/fine-tuning/grpo/grpo_lora-kubeflow-trainjob.ipynb — example notebook: TrainJob submission, parameter configuration, dataset format documentation, metrics inspection, reward curve plotting
  • examples/fine-tuning/README.md — updated to list GRPO in the algorithm list and Distributed "Learn more" section

Dependencies

TODO

  • Update %pip install to point to midstream once the SDK PR is merged, then remove once the workbench (universal) image includes the updated SDK
  • Validate notebook on fresh workbench with final image

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: ae51e0bd-8a8b-46e9-8677-056264d41472

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Fiona-Waters Fiona-Waters force-pushed the grpo_art_example branch 2 times, most recently from 0232b27 to 7fc12cc Compare May 18, 2026 15:52
Signed-off-by: Fiona-Waters <fiwaters6@gmail.com>
@briangallagher
Copy link
Copy Markdown
Contributor

@Fiona-Waters I would suggest interactive and distributed modes to be consistent with SFT, OSFT and Lora.

Is it possible to add a "Test the Trained Model" section in the notebook similar to the other algorithm notebooks.

Might be easier to add the MlFlow Interactive part as in this PR now, rather than have to open a follow on PR. Just for interactive example.

@Fiona-Waters Fiona-Waters force-pushed the grpo_art_example branch 5 times, most recently from 2ec21ac to 84f69b2 Compare May 21, 2026 14:48
- Add grpo_lora-interactive-notebook.ipynb for single-GPU GRPO training
  directly in the workbench
- Include "Test the Trained Model" section with dynamic checkpoint loading
- Update README to document both interactive and distributed execution modes
- Update workbench requirements for interactive mode (8 CPU, 64Gi memory)
- Remove custom reward function appendix from both notebooks (out of scope)

Signed-off-by: Fiona-Waters <fiwaters6@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@Fiona-Waters
Copy link
Copy Markdown
Author

Fiona-Waters commented May 21, 2026

@Fiona-Waters I would suggest interactive and distributed modes to be consistent with SFT, OSFT and Lora.

Is it possible to add a "Test the Trained Model" section in the notebook similar to the other algorithm notebooks.

Might be easier to add the MlFlow Interactive part as in this PR now, rather than have to open a follow on PR. Just for interactive example.

Thanks @briangallagher I have:

  • added an interactive notebook
  • Updated the readme to relect this
  • Added a test the trained model section
  • did not add MLflow as art does not support it.
    Please re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants