MCIF - Multimodal Crosslingual Instruction-Following

MCIF is a comprehensive benchmark for evaluating multimodal, multilingual instruction-following systems, which covers 3 modalities (text, speech, and video), 4 languages (English, German, Italian, and Chinese), and 13 tasks (organized into 4 macro-tasks).

A subset of MCIF has been used for the evaluation of the IWSLT 2025 Instruction-Following Shared Task.

📰 News

2025.10.22: 🤗 MCIF test set is released on HuggingFace
2025.10.21: ⭐️ MCIF Evaluation first release

📦 Repository Structure

The evaluation is the core component of this repository. All other components (i.e., dataset construction and baseline inference) are included to ensure full reproducibility and transparency of the evaluation results.

For details on dataset generation or baseline models, please refer to the dedicated READMEs (baselines may require specific dependencies):

🧱 Dataset Construction — scripts and guidelines for creating test sets and references → dataset_build/README.md
🚀 Baselines — inference scripts and outputs for baseline systems → baselines/README.md
📊 Evaluation — scoring and comparison utilities for submitted outputs → README.md

⚙️ Installation

The repository can be installed with pip install -e ..

▶️ Usage

For the evaluation, you can simply run:

mcif_eval -t {short/long} -l {en/de/it/zh} -s model_outputs.xml

where model_outputs.xml contains the outputs of your model for the selected track or context length (short or long) and target language among English (en), German (de), Italian (it) and Chinese (zh).

This will automatically download the reference from the Huggingface repository for the latest MCIF version. If you want to specify a different version, use -v. To run the evaluation without internet access, first download the MICF references and then provide them to mcif_eval with the -r parameter.

The file containing the model outputs to evaluate must be structured as follows:

<?xml version='1.0' encoding='utf-8'?>
<testset name="MCIF" type="output">
  <task track="{short/long}" text_lang="{en/de/it/zh}">
    <sample id="1">{SAMPLE1_CONTENT}</sample>
    <sample id="2">{SAMPLE2_CONTENT}</sample>
   ....
  </task>
</testset>

To ease usability, we provide a helper function that automatically formats model predictions into the XML structure required by the MCIF evaluation script. The method takes as input:

outputs: a list of tuples (sample_id, prediction) containing the sample id and its related prediction;
lang: the target language (en/de/it/zh);
track: the context length or track (short/long);
output_file: the path to the XML file being created containing all system's outputs, ready for evaluation.

📜 License

MCIF is released under the Apache 2.0 License.

🧩 Citation

If you use MCIF in your research, please cite:

@misc{mcif,
      title={MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks}, 
      author={Sara Papi and Maike Züfle and Marco Gaido and Beatrice Savoldi and Danni Liu and Ioannis Douros and Luisa Bentivogli and Jan Niehues},
      year={2025},
      eprint={2507.19634},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2507.19634}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github/workflows		.github/workflows
baselines		baselines
dataset_build		dataset_build
src/mcif		src/mcif
uts/utils		uts/utils
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mcif.png		mcif.png
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

MCIF - Multimodal Crosslingual Instruction-Following

📰 News

📦 Repository Structure

⚙️ Installation

▶️ Usage

📜 License

🧩 Citation

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Uh oh!

License

Uh oh!

hlt-mt/mcif

Folders and files

Latest commit

History

Repository files navigation

MCIF - Multimodal Crosslingual Instruction-Following

📰 News

📦 Repository Structure

⚙️ Installation

▶️ Usage

📜 License

🧩 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages