ReMoMask: Retrieval-Augmented Masked Motion Generation

This is the official repository for the paper:

ReMoMask: Retrieval-Augmented Masked Motion Generation

Zhengdao Li*, Siheng Wang*, Zeyu Zhang*^†, and Hao Tang^#

*Equal contribution. ^†Project lead. ^#Corresponding author.

Paper | Website | Model | HF Paper

teaser.mp4

✏️ Citation

@article{li2025remomask,
  title={ReMoMask: Retrieval-Augmented Masked Motion Generation},
  author={Li, Zhengdao and Wang, Siheng and Zhang, Zeyu and Tang, Hao},
  journal={arXiv preprint arXiv:2508.02605},
  year={2025}
}

👋 Introduction

Text-to-Motion (T2M) generation aims to synthesize realistic and semantically aligned human motion sequences from natural language descriptions. However, current approaches face dual challenges: Generative models (e.g., diffusion models) suffer from limited diversity, error accumulation, and physical implausibility, while Retrieval-Augmented Generation (RAG) methods exhibit diffusion inertia, partial-mode collapse, and asynchronous artifacts. To address these limitations, we propose ReMoMask, a unified framework integrating three key innovations: 1) A Bidirectional Momentum Text-Motion Model decouples negative sample scale from batch size via momentum queues, substantially improving cross-modal retrieval precision; 2) A Semantic Spatiotemporal Attention mechanism enforces biomechanical constraints during part-level fusion to eliminate asynchronous artifacts; 3) RAG-Classier-Free Guidance incorporates minor unconditional generation to enhance generalization. Built upon MoMask's RVQ-VAE, ReMoMask efficiently generates temporally coherent motions in minimal steps. Extensive experiments on standard benchmarks, including HumanML3D, demonstrate state-of-the-art performance, with the FID score significantly improved to 0.095 compared to SOTA RAG-t2m method.

TODO List

Upload our paper to arXiv and build project pages.
Upload the code.
Release TMR model.
Release T2M model.

🤗 Prerequisite

details

Environment

conda create -n remomask python=3.10
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
conda activate remomask

We tested our environment on both A800 and H20.

Dependencies

1. Download the pretrained models

RAG: Download the pretrained-rag-models (comming soon) and place at ./Part_TMR
T2M: Download the pretrained-t2m-models (comming soon) and place at ./logs/humanml3d/

2. Evaluation Models and Gloves

Follow previous method to prepare the evaluation models and gloves. Or you can download from here (provided by MoGenTS) and place to ./checkpoints

3. Prepare training dataset

Follow the instruction in HumanML3D, then place the result dataset to ./dataset/HumanML3D.

🚀 Demo

details

python demo.py --gpu_id 0 --ext exp1 --text_prompt "A person is walking on a circle." --checkpoints_dir logs --dataset_name humanml3d --mtrans_name pretrain_mtrans --rtrans_name pretrain_rtrans
# change pretrain_mtrans and pretrain_rtrans to your mtrans and rtrans after your training done

explanation:

--repeat_times: number of replications for generation, default 1.
--motion_length: specify the number of poses for generation.

output will be in ./outputs/exp1/

🛠️ Train your own models

details

stage1: train a Part-Level BMM Retriever

python -m Part_TMR.scripts.train

then build a rag database for training t2m model:

python build_rag_database.py

you will get ./database

stage2: train a Retrieval augemented MoMask

tarin a 2D RVQ-VAE quantizer

bash run_rvq.sh vq 0 humanml3d --batch_size 256 --num_quantizers 6 --max_epoch 50 --quantize_dropout_prob 0.2 --gamma 0.1 --code_dim2d 1024 --nb_code2d 256

train a 2D retrieval-augmented masked transformer

# using one gpu
bash run_mtrans.sh mtrans 1 0 humanml3d --vq_name pretrain_vq --batch_size 256 --max_epoch 2000 --attnj --attnt --latent_dim 512 --n_heads 8
# using multi gpus
bash run_mtrans.sh mtrans 8 0,1,2,3,4,5,6,7 humanml3d --vq_name pretrain_vq --batch_size 256 --max_epoch 2000 --attnj --attnt --latent_dim 512 --n_heads 8

tarin a 2D residual transformer

# using multi gpus 
bash run_rtrans.sh rtrans 2 humanml3d --batch_size 64 --vq_name vq --cond_drop_prob 0.01 --share_weight --max_epoch 2000 --attnj --attnt
# here, 2 means cuda:0,1

💪 Evalution

details

Evaluate the RAG

python -m Part_TMR.scripts.test

Evaluate the T2M

1. Evaluate the 2D RVQ-VAE

python eval_vq.py --gpu_id 0 --name pretrain_vq --dataset_name humanml3d --ext eval --which_epoch net_best_fid.tar
# change pretrain_vq to your vq

2. Evaluate the 2D retrieval-augmented masked transformer

python eval_mask.py --dataset_name humanml3d --mtrans_name pretrain_mtrans --gpu_id 0 --cond_scale 4 --time_steps 10 --ext eval --which_epoch fid
# change pretrain_mtrans to your mtrabs

3. Evaluate the 2D RAG masked transformer & 2D Residual Transformer

HumanML3D:

python eval_res.py --gpu_id 0 --dataset_name humanml3d --mtrans_name pretrain_mtrans --rtrans_name pretrain_rtrans --cond_scale 4 --time_steps 10 --ext eval --which_ckpt net_best_fid.tar --which_epoch fid --traverse_res
# change pretrain_mtrans and pretrain_rtrans to your mtrans and rtrans

KIT-ML:

python eval_res.py --gpu_id 0 --dataset_name kit --mtrans_name pretrain_mtrans_kit --rtrans_name pretrain_rtrans_kit --cond_scale 4 --time_steps 10 --ext eval --which_ckpt net_best_fid.tar --which_epoch fid --traverse_res
# change pretrain_mtrans and pretrain_rtrans to your mtrans and rtrans

🤖 Visualization

details

1. download and set up blender

details

You can download the blender from [instructions](https://www.blender.org/download/lts/2-93/). Please install exactly this version. For our paper, we use `blender-2.93.18-linux-x64`. > ### a. unzip it: ```bash tar -xvf blender-2.93.18-linux-x64.tar.xz ```

b. check if you have installed the blender successfully or not:

cd blender-2.93.18-linux-x64
./blender --background --version

you should see: Blender 2.93.18 (hash cb886axxxx built 2023-05-22 23:33:27)

./blender --background --python-expr "import sys; import os; print('\nThe version of python is ' + sys.version.split(' ')[0])"

you should see: The version of python is 3.9.2

c. get the blender-python path

./blender --background --python-expr "import sys; import os; print('\nThe path to the installation of python is\n' + sys.executable)"

you should see: The path to the installation of python is /xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9s

d. install pip for blender-python

/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m ensurepip --upgrade
/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install --upgrade pip

e. prepare env for blender-python

/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install numpy==2.0.2
/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install matplotlib==3.9.4
/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install hydra-core==1.3.2
/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install hydra_colorlog==1.2.0
/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install moviepy==1.0.3
/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install shortuuid==1.0.13
/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install natsort==8.4.0
/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install pytest-shutil==1.8.1
/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install tqdm==4.67.1
/xxx/blender-2.93.18-linux-x64/2.93/python/bin/python3.9 -m pip install tqdm==1.17.0

2. calulate SMPL mesh:

python -m fit --dir new_test_npy --save_folder new_temp_npy --cuda cuda:0

3. render to video or sequence

/xxx/blender-2.93.18-linux-x64/blender --background --python render.py -- --cfg=./configs/render_mld.yaml --dir=test_npy --mode=video --joint_type=HumanML3D

--mode=video: render to mp4 video
--mode=sequence: render to a png image, calle sequence.

👍 Acknowlegements

We sincerely thank the open-sourcing of these works where our code is based on:

MoMask, MoGenTS, ReMoDiffuse, MDM, TMR, ReMoGPT

🔒 License

This code is distributed under an CC BY-NC-SA 4.0.

Note that our code depends on other libraries, including CLIP, SMPL, SMPL-X, PyTorch3D, and uses datasets that each have their own respective licenses that must also be followed.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Part_TMR		Part_TMR
assets		assets
common		common
configs		configs
data		data
mld		mld
models		models
motion_loaders		motion_loaders
options		options
scripts		scripts
test		test
utils		utils
visualization		visualization
.gitignore		.gitignore
README.md		README.md
Readme.md		Readme.md
build_rag_database.py		build_rag_database.py
config.py		config.py
demo.py		demo.py
eval_mask.py		eval_mask.py
eval_res.py		eval_res.py
eval_vq.py		eval_vq.py
fit.py		fit.py
render.py		render.py
requirements.txt		requirements.txt
run_mtrans.sh		run_mtrans.sh
run_rtrans.sh		run_rtrans.sh
run_rvq.sh		run_rvq.sh
test.py		test.py
train_mask_transformer_ddp.py		train_mask_transformer_ddp.py
train_res_transformer_ddp.py		train_res_transformer_ddp.py
train_vq.py		train_vq.py

AIGeeksGroup/ReMoMask

Folders and files

Latest commit

History

Repository files navigation

ReMoMask: Retrieval-Augmented Masked Motion Generation

Paper | Website | Model | HF Paper

✏️ Citation

👋 Introduction

TODO List

🤗 Prerequisite

Environment

Dependencies

1. Download the pretrained models

2. Evaluation Models and Gloves

3. Prepare training dataset

🚀 Demo

🛠️ Train your own models

stage1: train a Part-Level BMM Retriever

stage2: train a Retrieval augemented MoMask

tarin a 2D RVQ-VAE quantizer

train a 2D retrieval-augmented masked transformer

tarin a 2D residual transformer

💪 Evalution

Evaluate the RAG

Evaluate the T2M

1. Evaluate the 2D RVQ-VAE

2. Evaluate the 2D retrieval-augmented masked transformer

3. Evaluate the 2D RAG masked transformer & 2D Residual Transformer

🤖 Visualization

1. download and set up blender

b. check if you have installed the blender successfully or not:

c. get the blender-python path

d. install pip for blender-python

e. prepare env for blender-python

2. calulate SMPL mesh:

3. render to video or sequence

👍 Acknowlegements

🔒 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages