This guide provides instructions for setting up the environment to run the SDPO codebase.
- Operating System: Linux (Tested on SLES 15 SP5 and Ubuntu 22.04)
- Hardware: NVIDIA GPUs (CUDA compatible)
- Python: 3.12 (Tested on 3.12.3)
- CUDA Driver: Compatible with the PyTorch version installed (see below).
Choose one of the following methods to set up your environment.
This is the standard approach for local workstations (e.g., RTX 5090).
1. Install PyTorch:
# Install PyTorch 2.5.1 (Stable for CUDA 12.4)
pip install torch==2.5.1 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu1242. Install SDPO and Dependences: From the root of the repository:
# Install dependencies
# Option 1: Stable pinned versions matching the cluster stack (Recommended)
pip install -r requirements-stable.txt
# Option 2: Latest compatible versions
# pip install -r requirements.txt
# Install SDPO (verl) in editable mode
pip install -e .
# Install Flash Attention 2
pip install flash-attn --no-build-isolationUse this if you want a guaranteed working environment without managing local dependencies.
1. Build and Run:
# Build the image
docker build -t sdpo:latest .
# Run container (with GPU support)
docker run --gpus all -it --ipc=host -v $(pwd):/app sdpo:latestInside the container, SDPO is already installed and ready to use.
Note
For more specific instructions on verl architecture and advanced configuration, refer to the official verl repository.
These components are not strictly required for the basic PPO training loop but are needed for specific advanced workflows.
This codebase supports vLLM and SGLang for high-throughput inference, which significantly accelerates the rollout phase of reinforcement learning. While optional for basic usage, they are recommended for large-scale training.
Installation:
pip install -r requirements_sglang.txtNote: This command installs specific versions of SGLang and vLLM compatible with this codebase. Ensure your NVIDIA drivers are compatible with the installed CUDA toolkit (e.g., CUDA 12.4 if matching the PyTorch installation above).
To verify the installation, you can run the tests:
# Install test dependencies
pip install pytest
# Run tests
pytest tests/This codebase was developed and tested using the NVIDIA NGC 25.12 software stack. While we recommend stable releases for general use, the exact environment state is:
- PyTorch:
2.10.0a0+b4e4ee81d3.nv25.12 - NGC Index:
https://pypi.ngc.nvidia.com - CUDA: 12.x (Optimized for GH200/H100)