red-hat-data-services · Fiona-Waters · May 28, 2026
diff --git a/examples/fine-tuning/README.md b/examples/fine-tuning/README.md
@@ -14,11 +14,12 @@ For detailed algorithm documentation and configuration options, see the upstream
 
 ## Execution Modes
 
-There are three ways to run fine tuning examples:
+There are four ways to run fine tuning examples:
 
 1. **Interactive (single node fine tuning)**
 2. **Distributed (distributed fine tuning with Kubeflow Trainer)**
-3. **Pipeline mode (automated training, model evaluation, and registration with Kubeflow Pipelines)**
+3. **Distributed on Ray (distributed fine tuning with KubeRay and CodeFlare SDK)**
+4. **Pipeline mode (automated training, model evaluation, and registration with Kubeflow Pipelines)**
 
 ### Interactive (single node fine tuning)
 
@@ -96,6 +97,35 @@ Training is offloaded to **dedicated training pods** managed by **Kubeflow Train
 
 ---
 
+### Distributed on Ray (distributed fine tuning with KubeRay)
+
+**What it is**
+
+Training is offloaded to a **Ray cluster** managed by **KubeRay**, submitted via **CodeFlare SDK**:
+
+- **Multi-GPU training** using Ray-native distribution (FSDP, Ray Train `TorchTrainer`)
+- **Short-lived clusters** — the SDK creates a RayCluster for the job and tears it down on completion
+- **Integration with Kueue** for resource queueing and scheduling
+- **Ray Dashboard** visibility for job monitoring and debugging
+
+**Recommended for**
+
+- **GRPO/RLVR training** with the verl backend (Ray-native, multi-GPU)
+- Teams already using Ray for distributed workloads
+- Workloads that benefit from Ray's actor-based distribution model
+
+**Resource considerations**
+
+- **GPU(s) required** on the Ray head pod (verl uses `STRICT_PACK` — all GPUs co-located)
+- **No shared PVC required** — the RayCluster is ephemeral; persist results to S3 or external storage
+- The Workbench only submits and monitors the job
+
+**Learn more**
+
+- [GRPO fine-tuning on Ray](grpo_ray/README.md)
+
+---
+
 ### Pipeline Mode (Automated Workflows)
 
 **What it is**

diff --git a/examples/fine-tuning/grpo_ray/README.md b/examples/fine-tuning/grpo_ray/README.md
@@ -0,0 +1,95 @@
+# GRPO Fine-Tuning on Ray with Training Hub
+
+This example demonstrates how to run Training Hub's [GRPO (Group Relative Policy Optimization)](https://github.com/Red-Hat-AI-Innovation-Team/training_hub?tab=readme-ov-file#grpo) on Ray using CodeFlare SDK and KubeRay on Red Hat OpenShift AI.
+
+## What is GRPO?
+
+GRPO is a reinforcement learning from verifiable rewards (RLVR) algorithm that improves a model's outputs by comparing groups of responses and reinforcing the better ones:
+
+- Generates multiple candidate responses per prompt
+- Scores them with a reward function (e.g. tool-call correctness)
+- Uses the group's relative ranking to compute advantage signals
+- Updates LoRA adapter weights via policy gradient with group normalization
+
+Each training iteration has two phases:
+
+1. **Rollout phase** — vLLM generates candidate responses and a reward function scores them
+2. **Train phase** — The policy network updates LoRA adapter weights using the advantage signals
+
+The verl backend orchestrates this using Ray-native actors with FSDP for the policy network and vLLM for inference, distributing across multiple GPUs.
+
+### Training Task: Tool-Call Verification
+
+The example uses the [Agent-Ark/Toucan-1.5M](https://huggingface.co/datasets/Agent-Ark/Toucan-1.5M) dataset, which contains tool-calling conversations. The reward function verifies that the model produces syntactically correct tool calls with the expected function name and arguments.
+
+## RHOAI Compatibility
+
+This example is compatible with RHOAI version 3.5.
+
+> [!NOTE]
+> The Ray runtime image used in this example is **tested and verified** only — it is not FIPS compliant and is not productised.
+
+## Requirements
+
+- An OpenShift cluster with OpenShift AI (RHOAI 3.5) installed:
+  - The `dashboard`, `workbenches`, and `ray` components enabled
+- Worker node(s) with NVIDIA GPUs (Ampere-based or newer, 80GB VRAM recommended per GPU)
+- `codeflare-sdk` installed in the workbench (pre-installed in RHOAI workbench images)
+
+## Hardware Requirements
+
+### Workbench Requirements
+
+The workbench only submits and monitors the RayJob — no GPU is needed on the workbench itself unless you want to test the trained model afterward.
+
+| Use Case | GPU | CPU | Memory |
+|----------|-----|-----|--------|
+| Job submission and monitoring | None | 2 cores | 8Gi |
+| Job submission + model evaluation after training | 1× GPU (40GB+ VRAM) | 8 cores | 64Gi |
+
+### Ray Cluster Requirements
+
+verl distributes training across multiple nodes using Ray-native actors with FSDP for the policy network and vLLM for inference generation.
+
+The default configuration uses 1 head node + 1 worker node, each with 2 GPUs (4 GPUs total).
+
+| Component | GPU | GPU Type | CPU | Memory |
+|-----------|-----|----------|-----|--------|
+| Head pod | 2× GPU | NVIDIA A100-80GB or H100 | 8 cores | 64Gi |
+| Worker pod (×1) | 2× GPU | NVIDIA A100-80GB or H100 | 8 cores | 192Gi request / 256Gi limit |
+
+> [!NOTE]
+>
+> - Memory requirements scale with model size and number of GPUs. The above values suit the example configuration (Qwen3-4B with LoRA, FSDP across 4× A100-80GB, vLLM with `gpu_memory_utilization=0.30`).
+> - Scale by adding more worker pods or increasing GPUs per node.
+
+## GRPO-specific Considerations
+
+- **`gpu_memory_utilization`**: Controls how much GPU memory vLLM reserves for inference. The default `0.30` leaves the majority for FSDP training. Adjust based on model size and available VRAM.
+- **HuggingFace token**: Not strictly required for public models (e.g. Qwen3-4B) but recommended to avoid rate limits. Pass as `HF_TOKEN` environment variable.
+
+## Setup
+
+### Prerequisites
+
+1. Log in to your OpenShift cluster (`oc login`)
+2. Ensure the KubeRay operator is running in your cluster
+3. Verify GPU nodes are available: `oc get nodes -l nvidia.com/gpu.present=true`
+
+### Running the Example
+
+From a RHOAI workbench:
+
+1. Clone this repository and navigate to `examples/fine-tuning/grpo_ray/`:
+
+   ```bash
+   git clone https://github.com/red-hat-data-services/red-hat-ai-examples.git
+   ```
+
+2. Open [`grpo_lora-rayjob.ipynb`](./grpo_lora-rayjob.ipynb)
+3. Follow the notebook instructions — update the namespace, image, and authentication details for your cluster
+
+> [!NOTE]
+> You will need a Hugging Face token if using gated models.
+> Set the `HF_TOKEN` variable in the configuration section.
+> You can skip the token for non-gated models like Qwen3-4B.