This is the official PyTorch implementation of the paper "Generating Human Interaction Motions in Scenes with Text Control" (ECCV2024). For more details, please visit our project webpage
if you find the code useful, kindly cite our work:
@inproceedings{yi2024tesmo,
author={Yi, Hongwei and Thies, Justus and Black, Michael J. and Peng, Xue Bin and Rempe, Davis},
title={Generating Human Interaction Motions in Scenes with Text Control},
booktitle = {ECCV},
year={2024}
}
This code was tested on Ubuntu 20.04.5 LTS
and requires:
- Python 3.8
- conda3 or miniconda3
- CUDA capable GPU (one is enough)
Install ffmpeg (if not already installed):
sudo apt update
sudo apt install ffmpeg
Setup conda env:
conda env create -f environment.yml
conda activate tesmo
python -m spacy download en_core_web_sm
pip install git+https://github.com/openai/CLIP.git
pip install git+https://github.com/GuyTevet/smplx.git
pip install wanbd
pip install "git+https://github.com/facebookresearch/pytorch3d.git"
pip install human-body-prior # this will leads to bm_name error
pip install moviepy
pip install hydra
git clone https://github.com/nghorbani/human_body_prior.git
python setup.py install
pip install python-fcl
# install mmcv
pip install mmcv==2.0.0rc4 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13/index.html
# visualization
pip install torchshow
pip install matplotlib==3.1.3
# pip install mesh-to-sdf
pip install git+https://github.com/marian42/mesh_to_sdf.git
pip install open3d
pip install plyfile
pip install numpy==1.23.4
PriorMDM share most of its dependencies with the original MDM. Please follow the official instructions to install the dependencies.
Download the pretrained models from interaction model and locomotion model, then unzip and place it in pretrained_model/
.
- Follow the insturctions in data_generation to get the bird-view floor plan and object mask from 3D-FRONT.
Then, align human motions with 3D scenes by runing align_motion_amass. All the fitting results will be saved to
dataset/3dfront_fitting
folder. - Finally, run this command to get canonical motion-scene pairs for training and inference:
The results will be saved to
bash scripts/preprocess_locomotion.sh
dataset/3dfront_fitting
.
- Follow the insturctions in data_generation to fit objects to human motions.
Then, place the fitting results under
dataset/SAMP_interaction/SAMP_objects
folder. - Download the 3D-FUTURE dataset from this webpage, then place the downloaded dataset under
dataset/3D-FUTURE-model
folder - Download the original SAMP dataset from this webpage, then place it under
dataset/SAMP_interaction/body_meshes_smpl
folder.
Before running human locomotion generation, please download pretrained root_horizontal_control
model from PriorMDM, and put it under folder pretrained_model/inpainting
.
To run locomotion inference on HumanML3D, you can run this command:
bash run_sh/run_infer_locomotion.sh
To render generated human motion in 3D scene, you can run this command:
bash run_sh/run_render_locomotion.sh [samplie_id] # sample_id: 0, 1, 2 ...
Finally, you can get the visualization result like this:
To evaluate a pretrained interaction model on SAMP, you can run this command to start inference:
bash run_sh/run_infer_interaction.sh
- Use
--joint2mesh
to run SMPLify. - Use
--render_pyrender
to visualize human-object interaction.
Finally, you can get the visualization results like this:
This codebase and pre-trained models are provided under this non-commercial license. As detailed below, parts of the code build on prior work with their own licenses.
Our implementation takes advantage of several great prior works that have their own licenses that must be followed:
- Some code is from PriorMDM
- We took reference from OmniControl
- We borrow some data generation code from SUMMON
- Data processing code from HUMANISE
We thanks these authors for their open source contributions.
Our models are trained on data that makes use of 3D-FRONT and 3D-FUTURE datasets.
Thanks Yangyi Huang, Yifei Liu and Xiaolin Hong for technical support. Thanks Mathis Petrovich and Nikos Athanasiou for the fruitful discussion about text-to-motion synthesis. Thanks T. Niewiadomski, T. McConnell, and T. Alexiadis for running the user study.