This repository contains the content of the following paper:
Inter-X: Towards Versatile Human-Human Interaction Analysis
Liang Xu1,2, Xintao Lv1, Yichao Yan1, Xin Jin2, Shuwen Wu1, Congsheng Xu1, Yifan Liu1, Yizhou Zhou3, Fengyun Rao3, Xingdong Sheng4, Yunhui Liu4, Wenjun Zeng2, Xiaokang Yang1
1 Shanghai Jiao Tong University 2 Eastern Institute of Technology, Ningbo 3WeChat, Tencent Inc. 4Lenovo
- [2024.08.10] We release the training and evaluation codes of text2motion, and the checkpoints.
- [2024.08.07] We release the data preprocessing code and the processed data.
- [2024.06.02] We release the code of fitting SMPL-X parameters from the MoCap data here.
- [2024.04.16] Release the Inter-X dataset, with SMPL-X parameters, skeleton parameters, and the annotations of textual descriptions, action settings, interaction order and relationships and personalities.
- [2024.02.27] Inter-X is accepted by CVPR 2024!
- [2023.12.27] We release the paper and project page of Inter-X.
- Release the action-to-motion code and checkpoints.
- Release the text-to-motion code and checkpoints.
- Release the data pre-processing code.
- Release the scripts to visualize the dataset.
- Release the whole dataset and annotations.
Plese stay tuned for any updates of the dataset and code!
Please fill out this form to request authorization to download Inter-X for research purposes.
We also provide the 40 action categories and train/val/test splittings under the folder of datasets
.
We recommend to use the AIT-Viewer to visualize the dataset.
pip install aitviewer
pip install -r visualize/smplx_viewer_tool/requirements.txt
You need to download the SMPL-X models and then place them under visualize/smplx_viewer_tool/body_models
.
├── SMPLX_FEMALE.npz
├── SMPLX_FEMALE.pkl
├── SMPLX_MALE.npz
├── SMPLX_MALE.pkl
├── SMPLX_NEUTRAL.npz
├── SMPLX_NEUTRAL.pkl
└── SMPLX_NEUTRAL_2020.npz
cd visualize/smplx_viewer_tool
# 1. Make sure the SMPL-X body models are downloaded
# 2. Create a soft link of the SMPL-X data to the smplx_viewer_tool folder
ln -s Your_Path_Of_SMPL-X ./data
# 3. Create a soft link of the texts annotations to the smplx_viewer_tool folder
ln -s Your_Path_Of_Texts ./texts
python data_viewer.py
cd visualize/joint_viewer_tool
# 1. Create a soft link of the skeleton data to the joint_viewer_tool folder
ln -s Your_Path_Of_Joints ./data
# 2. Create a soft link of the texts annotations to the joint_viewer_tool folder
ln -s Your_Path_Of_Texts ./texts
python data_viewer.py
Each file/folder name of Inter-X is in the format of GgggTtttAaaaRrrr
(e.g., G001T000A000R000), in which ggg is the human-human group number, ttt is the shoot number, aaa is the action label, and rrr is the split number.
The human-human group number is aligned with the big_five, familiarity annotations. The human-human group number starts from 001 to 059, the action label starts from 000 to 039.
The directory structure of the downloaded dataset is:
Inter-X_Dataset
├── LICENSE.md
├── annots
│ ├── action_setting.txt # 40 action categories
│ ├── big_five.npy # big-five personalities
│ ├── familiarity.txt # familiarity level, from 1-4, larger means more familiar
│ └── interaction_order.pkl # actor-reactor order, 0 means P1 is actor; 1 means P2 is actor
├── splits # train/val/test splittings
│ ├── all.txt
│ ├── test.txt
│ ├── train.txt
│ └── val.txt
├── motions.zip # SMPL-X parameters at 120 fps
├── skeletons.zip # skeleton parameters at 120 fps
└── texts.zip # textual descriptions
- To load the SMPL-X motion parameters you can simply do:
import numpy as np
# load the motion data
motion = np.load('motions/G001T000A000R000/P1.npz')
motion_parms = {
'root_orient': motion['root_orient'], # controls the global root orientation
'pose_body': motion['pose_body'], # controls the body
'pose_lhand': motion['pose_lhand'], # controls the left hand articulation
'pose_rhand': motion['pose_rhand'], # controls the right hand articulation
'trans': motion['trans'], # controls the global body position
'betas': motion['betas'], # controls the body shape
'gender': motion['gender'], # controls the gender
}
- To load the skeleton data you can simply do:
# The topology of the skeleton can be obtained in the OPTITRACK_LIMBS, SELECTED_JOINTS of the joint_viewer_tool/data_viewer.py
import numpy as np
skeleton = np.load('skeletons/G001T000A000R000/P1.npy') # skeleton.shape: (T, 64, 3)
We directly use the SMPL-X parameters to train the model, you can download the processed motion data, text data through the original Google drive link or the Baidu Netdisk.
processed_data
├── glove
│ ├── hhi_vab_data.npy
│ ├── hhi_vab_idx.pkl
│ └── hhi_vab_words.pkl
├── motions
│ ├── test.h5
│ ├── train.h5
│ └── val.h5
└── texts_processed
├── G001T000A000R000.txt
├── G001T000A000R001.txt
└── ......
Or, the data preprocessing code is shown as follows:
Commands for preprocessing the Inter-X dataset for training and evaluation:
-
Please clone the repository by the following command:
git clone https://github.com/liangxuy/Inter-X.git cd Inter-X/preprocessing
-
Setup the environment
- Install ffmpeg (if not already installed)
sudo apt update sudo apt install ffmpeg
- Setup conda environment
You can also manually download and install en_core_web_sm by download the en_core_web_sm-2.3.0.tar.gz and then run
conda env create -f environment.yml conda activate inter-x python -m spacy download en_core_web_sm pip install git+https://github.com/openai/CLIP.git
pip install en_core_web_sm-2.3.0.tar.gz
.
- Install ffmpeg (if not already installed)
-
Prepare the
motions.zip
,texts.zip
,splits
, etc. -
Run the commands one by one:
-
- Motion data processing, we downsample to 30 fps for training and evaluation
python 1_prepare_data.py
-
- Split train, test and val
python 2_split_train_val.py
-
- Processing text annotations
Download the glove.6B.zip and set the path of
glove_file
.python 3_text_process.py
-
- For human reaction generation
python 4_reaction_generation.py
-
The code of this part is under evaluation/text2motion
. We follow the work of text-to-motion to train and evaluate the text2motion model. You can build the environment as the original repository and then setup the data folder to ./dataset/inter-x
with a soft link.
Commands for training and evaluating the text2motion model:
We have provided the trained models on the dataset Google Drive link. You can download the checkpoints and put them to checkpoints/hhi
to skip the step 1~4 and organize them as:
checkpoints/hhi
├── Comp_v6_KLD01
│ ├── model
│ │ └── latest.tar
│ └── opt.txt
├── Decomp_SP001_SM001_H512
│ └── model
│ └── latest.tar
├── length_est_bigru
│ └── model
│ └── latest.tar
└── text_mot_match
└── model
└── finest.tar
-
Training motion autoencoder
python train_decomp_v3.py --name Decomp_SP001_SM001_H512 --gpu_id 0 --window_size 24 --dataset_name hhi
-
Training text2length model
python train_length_est.py --name length_est_bigru --gpu_id 0 --dataset_name hhi
-
Training text2motion model
python train_comp_v6.py --name Comp_v6_KLD01 --gpu_id 0 --lambda_kld 0.01 --dataset_name hhi
-
Training motion & text feature extractors
python train_tex_mot_match.py --name text_mot_match --gpu_id 0 --batch_size 8 --dataset_name hhi
-
Quantitative Evaluations
python final_evaluation.py
The statistical results will be saved to ./hhi_evaluation.log.
Coming soon!
If you find the Inter-X dataset is useful for your research, please cite us:
@inproceedings{xu2024inter,
title={Inter-x: Towards versatile human-human interaction analysis},
author={Xu, Liang and Lv, Xintao and Yan, Yichao and Jin, Xin and Wu, Shuwen and Xu, Congsheng and Liu, Yifan and Zhou, Yizhou and Rao, Fengyun and Sheng, Xingdong and others},
booktitle={CVPR},
pages={22260--22271},
year={2024}
}