🎉News

SimpleVLA-RL: Open RL Framework for Vision–Language–Action Models

SimpleVLA-RL is an efficient RL framework for VLA that improves long-horizon planning under data scarcity. It leverages reinforcement learning that can substantially outperforms SFT in simulation and real-world tasks, reveals a "pushcut" new-action phenomenon, and strengthens spatial/object/goal generalization.

🎉News

[2025-10-01] SimpleVLA-RL now supports RoboTwin2.0 Benchmark. Feel free to experiment with it!
[2025-09-12] Excited to release the SimpleVLA-RL paper! Check it out: Paper.
[2025-05-27] We release the code of SimpleVLA-RL.

📌Highlights

Efficient and Effective VLA Reinforcement Learning Framework

End-to-end VLA RL pipeline built on veRL with VLA-specific optimizations
Multi-environment parallel rendering significantly accelerates VLA trajectory sampling
Leverages veRL's state-of-the-art infrastructure: efficient distributed training (FSDP), hybrid communication patterns, and optimized memory management for fast training/inference

Model and Environment Support

VLA Models: OpenVLA, OpenVLA-OFT
Benchmarks: LIBERO, RoboTwin 1.0/2.0
Modular architecture for easy integration of new VLA models, benchmarks and RL algorithms (Upcoming)

Minimal Reward Engineering and Exploration Strategies

Binary (0/1) outcome rewards - no complex reward design needed
Exploration strategies: dynamic sampling, adaptive clipping, temperature tuning

🔧Key Implementations

SimpleVLA-RL extends veRL with VLA-specific components across the following modules:

verl/trainer/main_ppo.py

Main entry point with ray initialization
RobRewardManager for reward distribution

verl/trainer/ppo/ray_trainer.py

Main RL training loop: data loading, VLA rollout, model updates, evaluation, checkpointing
RL algorithm-specific advantage computation

verl/workers/fsdp_workers.py

Source of core functions called in ray_trainer.py
VLA model/optimizer initialization, generate_sequences, compute_entropy, update_actor

verl/workers/actor/dp_rob.py

Specific implementation of functions in fsdp_workers.py
RL loss computation, policy updates, compute_log_prob, compute_entropy

verl/workers/rollout/rob_rollout.py

VLA rollout implementation: environment creation, multi-environment parallel rendering, VLA action generation, environment interaction, video saving, trajectory and 0/1 reward collection

verl/utils/dataset/rob_dataset.py

Dataset construction for training/testing across benchmarks

verl/utils/vla_utils/

VLA model implementations (OpenVLA-OFT/OpenVLA from official code)

✨Getting Started

1. Set Up the Environment

See SETUP.md for detailed instructions on setting up the conda environment.

2. Prepare the SFT Model

An SFT (Supervised Fine-Tuning) VLA model is required for RL training. Below are the available options:

OpenVLA-OFT SFT Models
Download from the SimpleVLA-RL Collection. Available models include:
- libero-10 traj1/trajall SFT
- libero-goal/object/spatial traj1 SFT
- Robotwin2.0 tasks traj1000 SFT
OpenVLA SFT Models
Download from here.
Other Models
For other models, you may need to fine-tune them yourself.

3. Train with SimpleVLA-RL

Before running the training script, ensure the following configurations are properly set:

Set Your Weights and Biases (WandB) API Key
Replace the WANDB_API_KEY field in SimpleVLA-RL/align.json with your own WandB API key.
Modify Key Variables
Update the following variables in examples/run_openvla_oft_rl_libero/twin2.sh as needed:
- WANDB_API_KEY: Your WandB API key.
- EXPERIMENT_NAME: The name of your experiment. You can choose any name.
- SFT_MODEL_PATH: Path to your SFT model.
- CKPT_PATH: Path where your checkpoints will be saved.
- DATASET_NAME: For detailed options, refer to examples/run_openvla_oft_rl_libero/twin2.sh.
- ALIGN_PATH: Path to the SimpleVLA-RL/align.json file.
- NUM_GPUS: Number of GPUs available per node (e.g., 8).
- NUM_NODES: Number of nodes used for RL training (e.g., 1).

Note

The script has been tested on the following configurations:
- Single-node setup: NUM_NODES=1, NUM_GPUS=8 (1 node with 8 NVIDIA A800 GPUs, each having 80GB memory).
- Multi-node setup: NUM_NODES=2, NUM_GPUS=8 (2 nodes with 16 NVIDIA A800 GPUs, each having 80GB memory).
The driver version used is 470.161.03, and the CUDA version is 12.4. (Not necessary)

Run RL Training
Use the following command to start RL training for OpenVLA-OFT on the LIBERO or RoboTwin2.0 benchmark:
```
bash examples/run_openvla_oft_rl_libero.sh
or
bash examples/run_openvla_oft_rl_twin2.sh
```

4. Run Evaluation

To evaluate the performance of your model, enable evaluation mode by setting trainer.val_only=True in examples/run_openvla_oft_rl_libero/twin2.sh. Then, execute the same script:

bash examples/run_openvla_oft_rl_libero.sh
or
bash examples/run_openvla_oft_rl_twin2.sh

📃 Main Results

We evaluate SimpleVLA-RL on the LIBERO using OpenVLA-OFT. SimpleVLA-RL improves the performance of OpenVLA-OFT to 97.6 points on LIBERO-Long and sets a new state-of-the-art. Remarkably, using only one trajectory per task for cold-start SFT, SimpleVLA-RL raises the performance of OpenVLA-OFT from 17.3 to 91.7, yielding an improvement of 74.4 points (430.1%).

🌻Acknowledgement

We develop this preview version of the code based on veRL, OpenVLA-OFT, RoboTwin2.0, and PRIME. We acknowledge their significant contributions! For further details and updates, please refer to the official documentation and repositories of the respective projects.

📝Roadmap

Expanding Model Support

Support advanced diffusion based RL: pi0 and pi0.5 with flow matching RL
Support more VLA models: especially for lightweight ones (e.g. VLA-Adapter, SmolVLA)

Expanding Environment Support

Support more benchmarks: e.g. SimplerEnv, BEHAVIOR, Calvin
Support real-world RL.

Expanding Framework

Additional online RL methods and Offline RL algorithms
Modular environment and VLA interface for easy adaptation
Further optimize the RL framework to achieve more efficient training

📨Contact

Haozhan Li: [email protected]
Ning Ding: [email protected]

🎈Citation

If you find SimpleVLA-RL helpful, please cite us:

@article{li2025simplevla,
  title={SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning},
  author={Li, Haozhan and Zuo, Yuxin and Yu, Jiale and Zhang, Yuhao and Yang, Zhaohui and Zhang, Kaiyan and Zhu, Xuekai and Zhang, Yuchen and Chen, Tianxing and Cui, Ganqu and others},
  journal={arXiv preprint arXiv:2509.09674},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
examples		examples
figs		figs
modified_codes/robotwin2		modified_codes/robotwin2
verl		verl
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
align.json		align.json
copy_overwrite_robotwin2.sh		copy_overwrite_robotwin2.sh
pre_collect_robotwin2_seed.sh		pre_collect_robotwin2_seed.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SimpleVLA-RL: Open RL Framework for Vision–Language–Action Models

🎉News