Holistic Evaluation of Multimodal LLMs on Spatial Intelligence
English | 简体中文
EASI conceptualizes a comprehensive taxonomy of spatial tasks that unifies existing benchmarks and a standardized protocol for the fair evaluation of state-of-the-art proprietary and open-source models.
Key features include:
- Supports the evaluation of state-of-the-art Spatial Intelligence models.
- Systematically collects and integrates evolving Spatial Intelligence benchmarks.
- Proposes a standardized testing protocol to ensure fair evaluation and enable cross-benchmark comparisons.
For the full list of supported models and benchmarks, please refer to 👉 Supported Models & Benchmarks.
🌟 [2025-12-08] EASI v0.1.2 is released. Major updates include:
-
Expanded model support
Added 5 Spatial Intelligence models and 1 unified understanding–generation model:- SenseNova-SI 1.1 Series (Qwen2.5-VL-3B / Qwen2.5-VL-7B / Qwen3-VL-8B)
- SenseNova-SI 1.2 Series (InternVL3-8B)
- VLM-3R
- BAGEL-7B-MoT
-
Expanded benchmark support
Added 4 image benchmarks: STAR-Bench, OmniSpatial, Spatial-Visualization-Benchmark, SPAR-Bench. -
LLM-based answer extraction for EASI benchmarks
Added optional LLM-based answer extraction for several EASI benchmarks. You can enable OpenAI judging by:--judge gpt-4o-1120
which routes to gpt-4o-2024-11-20 for automated evaluation.
🌟 [2025-11-21] EASI v0.1.1 is released. Major updates include:
-
Expanded model support
Added 9 Spatial Intelligence models (total 7 → 16):- SenseNova-SI 1.1 Series (InternVL3-8B / InternVL3-2B)
- SpaceR-7B
- VST Series (VST-3B-SFT / VST-7B-SFT)
- Cambrian-S Series (0.5B / 1.5B / 3B / 7B)
-
Expanded benchmark support
Added 1 image–video benchmark: VSI-Bench-Debiased.
🌟 [2025-11-07] EASI v0.1.0 is released. Major updates include:
-
Expanded model support
Supported 7 Spatial Intelligence models:- SenseNova-SI Series (InternVL3-8B / InternVL3-2B)
- MindCube Series (3B-RawQA-SFT / 3B-Aug-CGMap-FFR-Out-SFT / 3B-Plain-CGMap-FFR-Out-SFT)
- SpatialLadder-3B
- SpatialMLLM-4B
-
Expanded benchmark support
Supported 6 Spatial Intelligence benchmarks:- 4 image benchmarks: MindCube, ViewSpatial, EmbSpatial, MMSI (no circular evaluation)
- 2 image–video benchmarks: VSI-Bench, SITE-Bench
-
Standardized testing protocol
Introduced the EASI testing protocol as described in the EASI paper.
git clone --recursive https://github.com/EvolvingLMMs-Lab/EASI.git
cd EASI
pip install -e ./VLMEvalKitVLM Configuration: During evaluation, all supported VLMs are configured in vlmeval/config.py. Make sure you can successfully infer with the VLM before starting the evaluation with the following command vlmutil check {MODEL_NAME}.
Benchmark Configuration: The full list of supported Benchmarks can be found in the official VLMEvalKit documentation VLMEvalKit Supported Benchmarks.
For the EASI Leaderboard, all EASI benchmarks are summarized in Supported Models & Benchmarks. A minimal example of recommended --data settings for EASI is:
| Benchmark | Evaluation settings |
|---|---|
| VSI-Bench | VSI-Bench_32frame |
| VSI-Bench-Debiased_32frame | |
| MindCube | MindCubeBench_tiny_raw_qa |
General command
python run.py --data {BENCHMARK_NAME} --model {MODEL_NAME} --judge {JUDGE_MODE} --verbose --reuse See run.py for the full list of arguments.
Example
Evaluate SenseNova-SI-1.2-InternVL3-8B on MindCubeBench_tiny_raw_qa:
python run.py --data MindCubeBench_tiny_raw_qa \
--model SenseNova-SI-1.2-InternVL3-8B \
--verbose --reuse --judge extract_matchingThis will use regular expressions to extract the answer. If you want to use an LLM-based judge (e.g., when evaluating SpatialVizBench_CoT), you can switch the judge to OpenAI:
python run.py --data SpatialVizBench_CoT \
--model {MODEL_NAME} \
--verbose --reuse --judge gpt-4o-1120
Note: to use OpenAI models, you must set the environment variable OPENAI_API_KEY.
@article{easi2025,
title={Holistic Evaluation of Multimodal LLMs on Spatial Intelligence},
author={Cai, Zhongang and Wang, Yubo and Sun, Qingping and Wang, Ruisi and Gu, Chenyang and Yin, Wanqi and Lin, Zhiqian and Yang, Zhitao and Wei, Chen and Shi, Xuanke and Deng, Kewang and Han, Xiaoyang and Chen, Zukai and Li, Jiaqi and Fan, Xiangyu and Deng, Hanming and Lu, Lewei and Li, Bo and Liu, Ziwei and Wang, Quan and Lin, Dahua and Yang, Lei},
journal={arXiv preprint arXiv:2508.13142},
year={2025}
}