teaser_video.mp4
- Clone Easi3R.
git clone https://github.com/Inception3D/Easi3R.git
cd Easi3R
- Create the environment, here we show an example using conda.
conda create -n easi3r python=3.10 cmake=3.31
conda activate easi3r
conda install pytorch torchvision pytorch-cuda=12.4 -c pytorch -c nvidia # use the correct version of cuda for your system
pip install -r requirements.txt
# install 4d visualization tool
pip install -e viser
# install SAM2
pip install -e third_party/sam2 --verbose
# compile the cuda kernels for RoPE (as in CroCo v2).
# DUST3R relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../
To download the weights of DUSt3R, MonST3R, RAFT and SAM2, run the following commands:
# download the weights
cd data
bash download_ckpt.sh
cd ..
To run the interactive inference demo, you can use the following command:
OPENBLAS_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=5 python demo.py \
--weights checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
# To change backbone, --weights checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth
The results will be saved in the demo_tmp/{Sequence Name}
(by default is demo_tmp/NULL
) folder for future visualization.
You can also run the inference in a non-interactive mode:
OPENBLAS_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=5 python demo.py --input demo_data/dog-gooses \
--output_dir demo_tmp --seq_name dog-gooses \
--weights checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
# To change backbone, --weights checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth
# To use SAM2, add: --sam2_mask_refine
# use video as input: --input demo_data/dog-gooses.mp4
# reduce the memory cost: set maximum number of frames used from video --num_frames 65
# faster video option: down sample the video fps to --fps 5
To visualize the interactive 4D results, you can use the following command:
python viser/visualizer.py --data demo_tmp/dog-gooses --port 9081
We provide here an example on the DAVIS dataset.
First, download the dataset:
cd data; python download_prepare_davis.py; cd ..
Then, run the evaluation script:
CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc_per_node=4 --master_port=29604 launch.py \
--mode=eval_pose \
--pretrained="checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth" \
--eval_dataset=davis --output_dir="results/davis/easi3r_dust3r" \
--use_atten_mask
# To change backbone, --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"
# To use SAM2, add: --sam2_mask_refine
If you just need dynamic mask, execute:
CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc_per_node=4 --master_port=29604 launch.py \
--mode=eval_pose --n_iter 0 \
--pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth" \
--eval_dataset=davis --output_dir="results/davis/easi3r_monst3r_sam" \
--use_atten_mask --sam2_mask_refine
The results will be saved in the results/davis/easi3r_monst3r_sam
folder. You could then run python mask_metric.py --results_path results/davis/easi3r_monst3r_sam
to evaluate the mask results, and run python vis_attention.py --method_name easi3r_monst3r_sam --base_output_dir results/visualization
to see the visualization of attention as in the webpage.
For the complete evaluation. Please refer to the evaluation_script.md for more details.
Our code is based on DUSt3R, MonST3R, DAS3R, Spann3R, CUT3R, LEAP-VO, Shape of Motion, TAPVid-3D, CasualSAM and Viser. We thank the authors for their excellent work!
If you find our work useful, please cite:
@article{chen2025easi3r,
title={Easi3R: Estimating Disentangled Motion from DUSt3R Without Training},
author={Chen, Xingyu and Chen, Yue and Xiu, Yuliang and Geiger, Andreas and Chen, Anpei},
journal={arXiv preprint arXiv:2503.24391},
year={2025}
}