Add GRPO fine-tuning example with README by Fiona-Waters · Pull Request #83 · red-hat-data-services/red-hat-ai-examples

Fiona-Waters · 2026-05-18T15:40:58Z

Summary

Add comprehensive README for GRPO (Group Relative Policy Optimization) fine-tuning, following the pattern established by existing SFT, OSFT, and LoRA examples
Add GRPO to the parent fine-tuning examples overview
Example notebook (grpo_lora-kubeflow-trainjob.ipynb) demonstrates single-GPU GRPO training via Kubeflow SDK and Training Hub's ART backend

What's included

examples/fine-tuning/grpo/README.md — algorithm overview, hardware requirements, workbench setup guide, GRPO-specific considerations (dshm volume, gpu_memory_utilization)
examples/fine-tuning/grpo/grpo_lora-kubeflow-trainjob.ipynb — example notebook: TrainJob submission, parameter configuration, dataset format documentation, metrics inspection, reward curve plotting
examples/fine-tuning/README.md — updated to list GRPO in the algorithm list and Distributed "Learn more" section

Dependencies

Kubeflow SDK: LORA_GRPO algorithm support feat: Add lora_grpo implementation to training hub algorithms opendatahub-io/kubeflow-sdk#111
Training Hub: 0.8.1 release.
Universal Training Image with openpipe-art and updated training-hub

TODO

Update %pip install to point to midstream once the SDK PR is merged, then remove once the workbench (universal) image includes the updated SDK
Validate notebook on fresh workbench with final image

coderabbitai · 2026-05-18T15:41:08Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: ae51e0bd-8a8b-46e9-8677-056264d41472

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Fiona-Waters <fiwaters6@gmail.com>

briangallagher · 2026-05-20T15:38:55Z

@Fiona-Waters I would suggest interactive and distributed modes to be consistent with SFT, OSFT and Lora.

Is it possible to add a "Test the Trained Model" section in the notebook similar to the other algorithm notebooks.

Might be easier to add the MlFlow Interactive part as in this PR now, rather than have to open a follow on PR. Just for interactive example.

- Add grpo_lora-interactive-notebook.ipynb for single-GPU GRPO training directly in the workbench - Include "Test the Trained Model" section with dynamic checkpoint loading - Update README to document both interactive and distributed execution modes - Update workbench requirements for interactive mode (8 CPU, 64Gi memory) - Remove custom reward function appendix from both notebooks (out of scope) Signed-off-by: Fiona-Waters <fiwaters6@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

Fiona-Waters · 2026-05-21T15:19:53Z

@Fiona-Waters I would suggest interactive and distributed modes to be consistent with SFT, OSFT and Lora.

Is it possible to add a "Test the Trained Model" section in the notebook similar to the other algorithm notebooks.

Might be easier to add the MlFlow Interactive part as in this PR now, rather than have to open a follow on PR. Just for interactive example.

Thanks @briangallagher I have:

added an interactive notebook
Updated the readme to relect this
Added a test the trained model section
did not add MLflow as art does not support it.
Please re-review.

Fiona-Waters force-pushed the grpo_art_example branch 2 times, most recently from 0232b27 to 7fc12cc Compare May 18, 2026 15:52

Adding GRPO/ART example

d207981

Signed-off-by: Fiona-Waters <fiwaters6@gmail.com>

Fiona-Waters force-pushed the grpo_art_example branch from 7fc12cc to d207981 Compare May 18, 2026 15:54

Fiona-Waters force-pushed the grpo_art_example branch 5 times, most recently from 2ec21ac to 84f69b2 Compare May 21, 2026 14:48

Fiona-Waters force-pushed the grpo_art_example branch from 84f69b2 to b24c8ae Compare May 21, 2026 14:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GRPO fine-tuning example with README#83

Add GRPO fine-tuning example with README#83
Fiona-Waters wants to merge 2 commits into
red-hat-data-services:mainfrom
Fiona-Waters:grpo_art_example

Fiona-Waters commented May 18, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Review skipped

Uh oh!

briangallagher commented May 20, 2026

Uh oh!

Fiona-Waters commented May 21, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Fiona-Waters commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

briangallagher commented May 20, 2026

Uh oh!

Fiona-Waters commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fiona-Waters commented May 18, 2026 •

edited

Loading

coderabbitai Bot commented May 18, 2026 •

edited

Loading

Fiona-Waters commented May 21, 2026 •

edited

Loading