Paper, Project Page, Finetuning Code
Moritz Reuss1, Hongyi Zhou1, Marcel Ruehle1, Ömer Erdinç Yağmurlu1, Fabian Otto2, Rudolf Lioutikov1
1Intuitive Robots Lab (IRL), Karlsruhe Institute of Technology (KIT) 2Microsoft Research
FLOWER VLA is a lightweight, efficient Vision-Language-Action (VLA) policy for robotic manipulation tasks that achieves state-of-the-art performance on multiple benchmarks. Built on a rectified flow architecture with several key architecture features:
- Efficient Architecture: At less than ~1B parameters, FLOWER is significantly smaller than other VLA models
- Low Training Cost: Only requires ~200 GPU hours of pretraining
- Low Memory Footprint: Uses <8GB of GPU memory for inference
- SOTA Performance: Achieves sota results on CALVIN and LIBERO benchmarks
For the finetuning code for FLOWER for CALVIN and LIBERO heck out our other codebase: flower_vla_calvin
- Installation
- Pretraining Guide
- Common Issues
- Advanced Usage
- Contributing
- Citation
- License
- Acknowledgments
- Python 3.10
- CUDA 11.8+
- 24GB+ GPU memory (training) (more is better:))
- 20GB+ disk space (datasets can be loaded from the google cloud)
# Create conda environment
conda create -n flower python=3.10
conda activate flower
# Clone repository
git clone --recurse-submodules [email protected]:mbreuss/flower_vla.git
cd flower_vla
# Install requirements
pip install -r requirements_simpler.txtFirst you need to chose a pretraining mix. Some datasets are not included in the google cloud storage and need to be loaded from the local storage instead. Below you will find guides for the most important datasets and how to download them:
Create a central dataset directory:
export DATA_DIR=~/tensorflow_datasetsThis is the recommended bridge dataset from Berkley, that is not part of OXE.
wget -r -np -nd -A '*' \
https://rail.eecs.berkeley.edu/datasets/bridge_release/data/tfds/bridge_dataset/ \
-P $DATA_DIR/bridge_datasetBiPlay is a diverse bimanual aloha dataset from project page.
git lfs install
git clone https://huggingface.co/datasets/oier-mees/BiPlay \
$DATA_DIR/aloha_play_datasetFLOWER uses huggingface accelerate library for efficient multi-GPU training. If you run it locally on a multi GPU system you can config the training config using the following answers:
accelerate configExample settings for 2-GPU training:
This machine
multi-GPU
1 # Number of machines
NO # fp16
NO # bf16
NO # Gradient accumulation
NO # Gradient clipping
NO # CPU offload
2 # Number of GPUs
0,1 # GPU indices
yes # Use DDP
bf16 # Mixed precision type
For training on a slurm cluster we provide an example script used for pretraining FLOWER on 4 H100 GPUs. Note, that it is important to have a main process port for being able to download the required datasets from the google cloud.
Modify conf/training.yaml:
# Basic Training Settings
batch_size: 512 # Total higher is better
gradient_accumulation_steps: 4 # recommended to use for llimited GPU memory settings to achieve larger batch sizes
max_train_steps: 500000
eval_every_n_steps: 10000 # does a short validation loss prediction for sanity checking NOTE: the validation loss does not correlate with the evaluation success rate and it is normal that it stagnates after some time. The model is still getting better.
max_eval_steps: 100 # how many batches to use for validation loss
# Dataset Configuration
DATA_NAME: "trinity" # datamix yo want to use
DATA_PATH: "~/tensorflow_datasets"
# Optimization Settings
learning_rate_dit: 1e-4 # we use seuperate lr for the Flow Transformer and VLM to achieve the best results
learning_rate_vlm: 1e-5 # lower lr for VLM is crucial while the higher one for the flow helps too
weight_decay: 0.1 # high weight decay for the flow part and low one for the VLM part
# Hardware Settings
num_workers: 8 # Adjust based on CPU cores
pin_memory: true accelerate launch flower/training.pyContinue from checkpoint:
accelerate launch flower/training.py \
+step=100 \
+continue_training=/path/to/checkpoint_100# Node 1 (Master)
accelerate launch --multi_gpu --num_processes=2 \
--main_process_ip="MASTER_IP" \
--main_process_port=29500 \
--num_machines=2 \
--machine_rank=0 \
flower/training.py
# Node 2
accelerate launch --multi_gpu --num_processes=2 \
--main_process_ip="MASTER_IP" \
--main_process_port=29500 \
--num_machines=2 \
--machine_rank=1 \
flower/training.pyTensorflow is a bit annoying to debug when adding new datasets and transforms. Therefore use the debug_transforms.py script to get proper error messages.
export TORCH_DISTRIBUTED_DEBUG=DETAIL
python flower/test_dataloader.py
python flower/debug_transforms.pyYou can create custom dataset mixes for pretraining and finetuning. The code for the oxe dataset is based on the code from Octo and OpenVLA.
Modify flower_vla/dataset/oxe/mixes.py:
CUSTOM_MIX = [
("bridge_dataset", 4.0),
("fractal20220817_data", 2.0),
("eef_droid", 0.2),
]You need to handle several things to integrate a new dataset into the code:
- Define a datset config in
flower_vla/dataset/oxe/configs.py - Define a transform for it in
flower_vla/dataset/oxe/transforms.py - Add the value for the frequency to
flower_vla/dataset/utils/frequency_mapping.py - Add it to the dataset index
flower_vla/dataset/utils/dataset_index.py - Add the desired action chunk length to
flower_vla/dataset/utils/act_seq_mapping.py
Now you should be good to go. If you still encounter issues use the debug_transforms.py script for testing.
Otherwise feel free to raise an issue or write me an email.
If you found the code useful, please cite our work:
@inproceedings{
reuss2025flower,
title={{FLOWER}: Democratizing Generalist Robot Policies with Efficient Vision-Language-Flow Models},
author={Moritz Reuss and Hongyi Zhou and Marcel R{\"u}hle and {\"O}mer Erdin{\c{c}} Ya{\u{g}}murlu and Fabian Otto and Rudolf Lioutikov},
booktitle={9th Annual Conference on Robot Learning},
year={2025},
url={https://openreview.net/forum?id=JeppaebLRD}
}This project is licensed under the MIT License - see the LICENSE file for details.
This work is only possible because of the code from the following open-source projects and datasets: