Checklist
Detailed Information
Hi team - thank you for your work on RL training infrastructure. I have been consistently facing RL instability when running this example using the default config - the only change I make when shifting from SGLang to vLLM is on this line where I replace sglang with vllm.
Interestingly, on both H100 and A6000 GPUs from NVIDIA, I find that SGLang works fine with the reward on validation split close to that given here but I find that the RL run collapses when running on H100 with vLLM. I used the FSDP trainer backend in my experiments. I tried setting the enforce_eager flag in vLLM to True but that didn't help either. Is there any specific set of flags/settings needed to make vLLM work with AReaL? My experiments were run using a relatively recent commit ae8c792fdb5e21f77b3b9bca9c435cb6d1ddf62b.
I have attached the reward curves and grad norm curves for reference (the grad norm seems to explode/be higher for vLLM but not for SGLang)
Describe the bug
See above
Expected behavior
Default config with this simple RL example should work regardless of choice of inference backend.
Full logs
N/A
To Reproduce
Setup AReaL with vLLM/SGLang and then run python3 examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml scheduler.type=local
Commit ID
ae8c792fdb5e21f77b3b9bca9c435cb6d1ddf62b
Environment
NVIDIA H100 and A6000 GPUs. Installation simply required uv sync --extra cuda with Python 3.12 for SGLang. For vLLM, we installed flash-attn from the pre-built wheel: https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.10-cp312-cp312-linux_x86_64.whl for torch 2.10.
Script
https://github.com/inclusionAI/AReaL/blob/main/examples/math/gsm8k_grpo.yaml
Checklist
cause, not a secondary error caused by peer workers.
Detailed Information
Hi team - thank you for your work on RL training infrastructure. I have been consistently facing RL instability when running this example using the default config - the only change I make when shifting from SGLang to vLLM is on this line where I replace
sglangwithvllm.Interestingly, on both H100 and A6000 GPUs from NVIDIA, I find that SGLang works fine with the reward on validation split close to that given here but I find that the RL run collapses when running on H100 with vLLM. I used the FSDP trainer backend in my experiments. I tried setting the
enforce_eagerflag in vLLM to True but that didn't help either. Is there any specific set of flags/settings needed to make vLLM work with AReaL? My experiments were run using a relatively recent commitae8c792fdb5e21f77b3b9bca9c435cb6d1ddf62b.I have attached the reward curves and grad norm curves for reference (the grad norm seems to explode/be higher for vLLM but not for SGLang)
Describe the bug
See above
Expected behavior
Default config with this simple RL example should work regardless of choice of inference backend.
Full logs
N/A
To Reproduce
Setup AReaL with vLLM/SGLang and then run
python3 examples/math/gsm8k_rl.py --config examples/math/gsm8k_grpo.yaml scheduler.type=localCommit ID
ae8c792fdb5e21f77b3b9bca9c435cb6d1ddf62bEnvironment
NVIDIA H100 and A6000 GPUs. Installation simply required
uv sync --extra cudawith Python 3.12 for SGLang. For vLLM, we installed flash-attn from the pre-built wheel: https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.7.16/flash_attn-2.8.3+cu128torch2.10-cp312-cp312-linux_x86_64.whl for torch 2.10.Script
https://github.com/inclusionAI/AReaL/blob/main/examples/math/gsm8k_grpo.yaml