Skip to content

Add GRPO on Ray guided example with README#86

Draft
Fiona-Waters wants to merge 1 commit into
red-hat-data-services:mainfrom
Fiona-Waters:grpo_ray_example
Draft

Add GRPO on Ray guided example with README#86
Fiona-Waters wants to merge 1 commit into
red-hat-data-services:mainfrom
Fiona-Waters:grpo_ray_example

Conversation

@Fiona-Waters
Copy link
Copy Markdown

Summary

  • Add guided example for multi-node distributed GRPO fine-tuning on Ray via CodeFlare SDK and KubeRay
  • New examples/fine-tuning/grpo_ray/ directory with README and notebook
  • Update parent fine-tuning README to include "Distributed on Ray" as a fourth execution mode

What's included

  • examples/fine-tuning/grpo_ray/README.md — algorithm overview, hardware requirements, GRPO-specific considerations, setup guide
  • examples/fine-tuning/grpo_ray/grpo_lora-rayjob.ipynb — example notebook: multi-node RayJob submission via CodeFlare SDK, training parameter configuration, reward curve plotting
  • examples/fine-tuning/README.md — updated to list Ray as an execution mode

Details

  • Uses Training Hub's lora_grpo() with the verl backend (FSDP + vLLM) distributed across 2 nodes (4 GPUs total)
  • Submitted via CodeFlare SDK RayJob + ManagedClusterConfig
  • Ray runtime image: tested & verified only (not productised)

Dependencies

  • Ray runtime image with training-hub, verl, and vLLM (RHOAIENG-61568)
  • CodeFlare SDK with RayJob + ManagedClusterConfig support

TODO

  • Update image reference once Konflux build succeeds (currently using spike image)
  • Validate notebook end-to-end on cluster with final image
  • Confirm reward curve parsing works with verl log output

Related Tickets

Related jira tickets here:

How Has This Been Tested?

TODO - Run notebook manually on an openshift cluster

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 228f8047-0601-4477-9a33-94add55cee4c

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@Fiona-Waters Fiona-Waters force-pushed the grpo_ray_example branch 2 times, most recently from c2c9e3e to 2bcdd02 Compare May 29, 2026 09:14
- Add examples/fine-tuning/grpo_ray/ with multi-node distributed GRPO
  training via CodeFlare SDK RayJob + ManagedClusterConfig
- Notebook demonstrates verl backend with FSDP + vLLM across 2 nodes
- Update fine-tuning README to include Ray as a fourth execution mode

Signed-off-by: Fiona-Waters <fiwaters6@gmail.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant