🚀Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More

Zichen Wen^1,2, Yifeng Gao¹, Shaobo Wang¹, Junyuan Zhang², Qintong Zhang^2,4,
Weijia Li^3,2, Conghui He^2✉, Linfeng Zhang^1✉,

¹Shanghai Jiao Tong University, ²Shanghai AI Laboratory,
³Sun Yat-sen University, ⁴Peking University

🔥 News

2025.10.13 🤗🤗 We have released our latest work EPIC, an efficient framework for progressive consistency distillation in multimodal large language models!
2025.10.10 🤗🤗 We've released our latest work, VTC-Bench. Come test whether your token compression method really works!
2025.08.30 🤗🤗 We have seamlessly integrated DART into Qwen2.5-VL.
2025.08.21 🤗🤗 Our DART is accepted at EMNLP'25 main!
2025.05.15 🤗🤗 Our analytical work on token compression has been accepted as ACL'25 Finding!
2025.03.19 🤗🤗 The implementation and evaluation scripts for LLaVA-Next are now available
2025.03.18 🤗🤗 We have released the implementation of DART for Qwen2-VL, and now you can easily evaluate it using lmms-eval!
2025.02.22 🤗🤗 We release our latest work DART, a plug-and-play, training-free token reduction method that seamlessly integrates with efficient attention operators. Code is available!

👀 Overview

TLDR: We propose DART (Duplication-Aware Reduction of Tokens), a training-free method that prunes vision tokens based on duplication, achieving 88.9% token reduction and 1.99 speed-up while maintaining performance and compatibility with efficient attention operators.

🛠 Preparation

LLaVA

Clone this repository.

git clone https://github.com/ZichenWen1/DART
cd DART

Environment Setup and Preparation

 conda create -n DART python=3.10 -y
 conda activate DART
 pip install -e .
 pip install flash-attn --no-build-isolation

Download Multimodal Benchmark

Please follow the detailed instruction in LLaVA-Evaluation.

Qwen2-VL

 conda create -n DART_Qwen2VL python=3.10 -y
 conda activate DART_Qwen2VL
 cd Qwen2-VL/transformers && pip install -e .
 pip install accelerate qwen-vl-utils[decord]
 pip install flash-attn --no-build-isolation
 cd ../../lmms-eval && pip install -e .

Qwen2.5-VL

pip install -U transformers==4.55.4

🎯 Usage

LLaVA

📖 Script Templates

bash scripts/v1_5/eval/[Benchmark].sh [Reduction_Ratio] [Max_Num_Trunction]

🐝 Examples

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/textvqa.sh 0.778 128

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/pope.sh 0.778 128

CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/mme.sh 0.778 128

Qwen2-VL

🐝 Examples

cd Qwen2-VL
bash eval_scripts/lmms_eval.sh True [Reduction_Ratio]

Qwen2.5-VL

🐝 Examples

cd Qwen2_5-VL
bash eval_scripts/lmms_eval.sh True [Reduction_Ratio]

🔑 License

This project is released under the Apache 2.0 license.

📌 Citation

Please consider citing our paper in your publications, if our findings help your research.

@article{wen2025stop,
  title={Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More},
  author={Wen, Zichen and Gao, Yifeng and Wang, Shaobo and Zhang, Junyuan and Zhang, Qintong and Li, Weijia and He, Conghui and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2502.11494},
  year={2025}
}

@article{wen2025token,
  title={Token Pruning in Multimodal Large Language Models: Are We Solving the Right Problem?},
  author={Wen, Zichen and Gao, Yifeng and Li, Weijia and He, Conghui and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2502.11501},
  year={2025}
}

👍 Acknowledgment

We extend our gratitude to the open-source efforts of LLaVA, Qwen2-VL, and lmms-eval.

📩 Contact

For any questions about our paper or code, please email [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
Qwen2-VL		Qwen2-VL
Qwen2_5-VL		Qwen2_5-VL
docs		docs
images		images
llava		llava
lmms-eval		lmms-eval
playground/data		playground/data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More

Zichen Wen^1,2, Yifeng Gao¹, Shaobo Wang¹, Junyuan Zhang², Qintong Zhang^2,4,
Weijia Li^3,2, Conghui He^2✉, Linfeng Zhang^1✉,

¹Shanghai Jiao Tong University, ²Shanghai AI Laboratory,
³Sun Yat-sen University, ⁴Peking University

🔥 News

👀 Overview

🛠 Preparation

LLaVA

Qwen2-VL

Qwen2.5-VL

🎯 Usage

LLaVA

📖 Script Templates

🐝 Examples

Qwen2-VL

🐝 Examples

Qwen2.5-VL

🐝 Examples

🔑 License

📌 Citation

👍 Acknowledgment

📩 Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ZichenWen1/DART

Folders and files

Latest commit

History

Repository files navigation

🚀Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More

Zichen Wen1,2, Yifeng Gao1, Shaobo Wang1, Junyuan Zhang2, Qintong Zhang2,4, Weijia Li3,2, Conghui He2✉, Linfeng Zhang1✉, 1Shanghai Jiao Tong University, 2Shanghai AI Laboratory, 3Sun Yat-sen University, 4Peking University

🔥 News

👀 Overview

🛠 Preparation

LLaVA

Qwen2-VL

Qwen2.5-VL

🎯 Usage

LLaVA

📖 Script Templates

🐝 Examples

Qwen2-VL

🐝 Examples

Qwen2.5-VL

🐝 Examples

🔑 License

📌 Citation

👍 Acknowledgment

📩 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Zichen Wen^1,2, Yifeng Gao¹, Shaobo Wang¹, Junyuan Zhang², Qintong Zhang^2,4,
Weijia Li^3,2, Conghui He^2✉, Linfeng Zhang^1✉,

¹Shanghai Jiao Tong University, ²Shanghai AI Laboratory,
³Sun Yat-sen University, ⁴Peking University

Packages