RoboMonster is a compositional embodied manipulation framework and benchmark built on ManiSkill that treats a robot as a team of heterogeneous, specialized end-effectors instead of a single gripper. It pairs a constraint-driven high-level planner with imitation-learned control policies for each tool to select, coordinate, and sequence the right end-effector for each sub-task, showing clear gains over gripper-only baselines.
First, clone this repository to your local machine, and install vulkan and the following dependencies.
git clone [email protected]:MARS-EAI/RoboMonster.git
conda create -n RoboMonster python=3.9
conda activate RoboMonster
cd RoboMonster
pip install -r requirements.txt
cd RoboMonster
# (optional): conda install -c conda-forge networkx=2.5Then download the 3D assets in RoboMonster task:
python script/download_assets.py We use a specific fork version of ManiSkill for RoboMonster: Maniskill_fork_for_RoboMonster, you should replace the mani_skill in your local conda environment!
# NOTE: the mani_skill install path usually is ~/anaconda3/envs/RoboMonster/lib/python3.9/site-packages/mani_skill/ folder
# Example:
mv -f ../Maniskill_fork_for_RoboMonster/agent/robots/panda {your mani_skill install path}/agent/robots/
mv -f ../Maniskill_fork_for_RoboMonster/assets/robots/panda {your mani_skill install path}/assets/robots/Now, try to run the task with just a line of code:
# You can choose the variant in ["gripper", "ours"], "gripper" means gripper-only and "ours" means heterogeneous multi-end effectors, for example:
python script/run_task.py configs/table/swipe_card.yaml ours
# or
python script/run_task.py configs/table/circle_vase.yaml gripperFor more complex scene like RoboCasa, you can download them using the following commands. Note that if you use these scenes in your work please cite the scene dataset authors.
python -m mani_skill.utils.download_asset RoboCasaAfter download the scene dataset, you can try to run it:
python script/run_task.py configs/robocasa/swipe_card.yaml oursIf you are running simulation environments on a headless Debian server without a graphical desktop, you will need to install a minimal set of OpenGL and EGL libraries to ensure compatibility.
Run the following commands to install the necessary runtime libraries:
sudo apt update
sudo apt install libgl1 libglvnd0 libegl1-mesa libgles2-mesa libopengl0You can use the following script to generate data for DP2 or DP3. The generated data is usually placed in the demos/ folder.
# Format: python script/generate_data.py --config {config_path} --num {traj_num} --variant {gripper-only or heterogeneous agent(ours)} [--save-video]
# Generate data for DP2:
python script/generate_data.py --config configs/table/swipe_card.yaml --num 75 --variant ours --save-video
# Generate data for DP3:
python script/generate_data_pointcloud.py --config configs/table/circle_vase.yaml --num 150 --variant gripper --save-video
# For short:
python script/generate_data.py configs/table/swipe_card.yaml 75 oursThe data generated by the ManiSkill framework is in .h5 format. To accelerate training, we restructure the data format.
# 1. make data folder if it does not exist
mkdir data && mkdir data/h5_data
# 2. move your .h5 and .json file into the data/h5_data folder.
# NOTE: data_type can choose in ["rgb", "pointcloud"], You should follow the naming convention to avoid issues in later scripts.
mv {your_h5_file}.h5 data/h5_data/{task_name}_{data_type}.h5
mv {your_h5_file}.json data/h5_data/{task_name}_{data_type}.json
# 3. run the script to process the data.
# NOTE: This is the script for default config. If you add the additional camera in config yaml, modify the script to adapt the data.
# Example for DP2:
# --load-num is the demonstration number (Identical to {traj_num} in the data generation command.)
python script/convert_data.py data/h5_data/swipe_card_rgb.h5 --agent-num 2 --modality image --load-num 75
# Example for DP3:
python script/convert_data.py data/h5_data/circle_vase_pointcloud.h5 --agent-num 1 --modality pointcloud --load-num 150
# 4. (Optional) you can check our converted .h5 files by the read_h5.py in script/tools/h5py/ folder.
# Example:
cp data/{task_name}_{data_type}.h5 script/tools/h5py/input
python script/tools/h5py/read_h5.py -i data/{task_name}_{data_type}.h5We currently provide training code for Diffusion Policy (DP) and 3D Diffusion Policy (DP3), and we plan to provide more policies in the future. You can train the DP model through the following code:
# Format: python policy/Diffusion-Policy/diffusion_policy/workspace/{policy_workspace} --config-name={policy} task={task_config}
# Example for DP2 training:
python policy/Diffusion-Policy/diffusion_policy/workspace/workspace_dp2.py --config-name=dp2 task=2a_swipe_card_2d
# Example for DP3 training:
python policy/Diffusion-Policy/diffusion_policy/workspace/workspace_dp3.py --config-name=dp3 task=1a_circle_vase_3dUse the .ckpt file (usually in the outputs/ folder) to evaluate your model results after the training is completed. When setting DEBUG_MODE to 1, it will output more info.
# Example for DP2 inference:
python policy/Diffusion-Policy/eval_dp2.py --config configs/table/swipe_card.yaml --variant ours --ckpt {your_ckpt_path}
# Example for DP3 inference:
python policy/Diffusion-Policy/eval_dp3.py --config configs/table/circle_vase.yaml --variant gripper --ckpt {your_ckpt_path}