Skip to content

Inception3D/Easi3R

Repository files navigation

Easi3R is a simple training-free approach adapting DUSt3R for dynamic scenes.

teaser_video.mp4

Getting Started

Installation

  1. Clone Easi3R.
git clone https://github.com/Inception3D/Easi3R.git
cd Easi3R
  1. Create the environment, here we show an example using conda.
conda create -n easi3r python=3.10 cmake=3.31
conda activate easi3r
conda install pytorch torchvision pytorch-cuda=12.4 -c pytorch -c nvidia  # use the correct version of cuda for your system
pip install -r requirements.txt
# install 4d visualization tool
pip install -e viser
# install SAM2
pip install -e third_party/sam2 --verbose
# compile the cuda kernels for RoPE (as in CroCo v2).
# DUST3R relies on RoPE positional embeddings for which you can compile some cuda kernels for faster runtime.
cd croco/models/curope/
python setup.py build_ext --inplace
cd ../../../

Download Checkpoints

To download the weights of DUSt3R, MonST3R, RAFT and SAM2, run the following commands:

# download the weights
cd data
bash download_ckpt.sh
cd ..

Inference

To run the interactive inference demo, you can use the following command:

OPENBLAS_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=5 python demo.py \
    --weights checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth 
# To change backbone, --weights checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth

The results will be saved in the demo_tmp/{Sequence Name} (by default is demo_tmp/NULL) folder for future visualization.

You can also run the inference in a non-interactive mode:

OPENBLAS_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=5 python demo.py --input demo_data/dog-gooses \
    --output_dir demo_tmp --seq_name dog-gooses \
    --weights checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth 
# To change backbone, --weights checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth
# To use SAM2, add: --sam2_mask_refine
# use video as input: --input demo_data/dog-gooses.mp4 
# reduce the memory cost: set maximum number of frames used from video --num_frames 65 
# faster video option: down sample the video fps to --fps 5

Visualization

To visualize the interactive 4D results, you can use the following command:

python viser/visualizer.py --data demo_tmp/dog-gooses --port 9081

Evaluation

We provide here an example on the DAVIS dataset.

First, download the dataset:

cd data; python download_prepare_davis.py; cd ..

Then, run the evaluation script:

CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc_per_node=4 --master_port=29604 launch.py \
    --mode=eval_pose \
    --pretrained="checkpoints/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth"   \
    --eval_dataset=davis --output_dir="results/davis/easi3r_dust3r" \
    --use_atten_mask
# To change backbone, --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"
# To use SAM2, add: --sam2_mask_refine

If you just need dynamic mask, execute:

CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc_per_node=4 --master_port=29604 launch.py \
    --mode=eval_pose --n_iter 0 \
    --pretrained="checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth"   \
    --eval_dataset=davis --output_dir="results/davis/easi3r_monst3r_sam" \
    --use_atten_mask --sam2_mask_refine

The results will be saved in the results/davis/easi3r_monst3r_sam folder. You could then run python mask_metric.py --results_path results/davis/easi3r_monst3r_sam to evaluate the mask results, and run python vis_attention.py --method_name easi3r_monst3r_sam --base_output_dir results/visualization to see the visualization of attention as in the webpage.

For the complete evaluation. Please refer to the evaluation_script.md for more details.

Acknowledgements

Our code is based on DUSt3R, MonST3R, DAS3R, Spann3R, CUT3R, LEAP-VO, Shape of Motion, TAPVid-3D, CasualSAM and Viser. We thank the authors for their excellent work!

Citation

If you find our work useful, please cite:

@article{chen2025easi3r,
    title={Easi3R: Estimating Disentangled Motion from DUSt3R Without Training},
    author={Chen, Xingyu and Chen, Yue and Xiu, Yuliang and Geiger, Andreas and Chen, Anpei},
    journal={arXiv preprint arXiv:2503.24391},
    year={2025}
    }

About

[ICCV 2025] A simple training-free approach adapting DUSt3R for dynamic scenes.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published