Sparse and Dense Retrievers Learn Better Together

This repository contains the official implementation for the paper "Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval" accepted to CIKM 2025 Short Track.

ArXiv: https://arxiv.org/abs/2508.16707

Authors: Jonghyun Song, Youngjune Lee, Gyu-Hwung Cho, Ilhyeon Song, Saehun Kim, Yohan Jo

📋 Abstract

Vision-Language Pretrained (VLP) models have achieved impressive performance on multimodal tasks, including text-image retrieval, based on dense representations. Meanwhile, Learned Sparse Retrieval (LSR) has gained traction in text-only settings due to its interpretability and efficiency with fast term-based lookup via inverted indexes. Inspired by these advantages, recent work has extended LSR to the multimodal domain. However, these methods often rely on computationally expensive contrastive pre-training, or distillation from a frozen dense model, which limits the potential for mutual enhancement. To address these limitations, we propose a simple yet effective framework that enables bi-directional learning between dense and sparse representations through Self-Knowledge Distillation. This bi-directional learning is achieved using an integrated similarity score—a weighted sum of dense and sparse similarities—which serves as a shared teacher signal for both representations. To ensure efficiency, we fine-tune the final layer of the dense encoder and the sparse projection head, enabling easy adaptation of any existing VLP model. Experiments on MSCOCO and Flickr30k demonstrate that our sparse retriever not only outperforms existing sparse baselines, but also achieves performance comparable to—or even surpassing—its dense counterparts, while retaining the benefits of sparse models.

🚀 Quick Start

Environment Setup

This experiment runs under Python 3.9 and CUDA 12.4. To set up the conda environment:

sh conda.sh

Data Preparation

Required Data: We only need embeddings right before being processed by the final layer.

Download Links:

Pre-computed embeddings

Original Dataset Sources (not necessary for reproduction):

MS COCO:
Flickr30k: Dataset

Directory Structure: After downloading, organize your data as follows:

.cache/
├── mscoco/
│   ├── text_embs_before_proj_blip.parquet
│   ├── img_embs_before_proj_blip.parquet
│   ├── text_embs_before_proj_albef.parquet
│   └── img_embs_before_proj_albef.parquet
└── flickr30k/
    ├── text_embs_before_proj_blip.parquet
    ├── img_embs_before_proj_blip.parquet
    ├── text_embs_before_proj_albef.parquet
    └── img_embs_before_proj_albef.parquet

🏋️ Training

To train the model, use one of the following commands:

# For ALBEF on MS COCO
python train.py --config training_config/albef-coco.json

# For ALBEF on Flickr30k
python train.py --config training_config/albef-flickr.json

# For BLIP on MS COCO
python train.py --config training_config/blip-coco.json

# For BLIP on Flickr30k
python train.py --config training_config/blip-flickr.json

📁 Pre-trained Models

Download pre-trained checkpoints from here.

📖 Citation

If you find this work useful, please cite our paper:

@inproceedings{song2025sparse,
  title={Sparse and Dense Retrievers Learn Better Together: Joint Sparse-Dense Optimization for Text-Image Retrieval},
  author={Jonghyun Song, Youngjune Lee, Gyu-Hwung Cho, Ilhyeon Song, Saehun Kim, and Yohan Jo},
  booktitle={Proceedings of the 34th ACM International Conference on Information and Knowledge Management (CIKM '25)},
  year={2025},
  pages={5},
  publisher={ACM},
  doi={10.1145/3746252.3760959}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
training_config		training_config
config.py		config.py
data_module.py		data_module.py
dataset.py		dataset.py
lightning_model.py		lightning_model.py
loss.py		loss.py
model.py		model.py
readme.md		readme.md
regularizer.py		regularizer.py
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparse and Dense Retrievers Learn Better Together

📋 Abstract

🚀 Quick Start

Environment Setup

Data Preparation

🏋️ Training

📁 Pre-trained Models

📖 Citation

About

Uh oh!

Releases

Packages

Languages

holi-lab/mm-sparse-retrieval

Folders and files

Latest commit

History

Repository files navigation

Sparse and Dense Retrievers Learn Better Together

📋 Abstract

🚀 Quick Start

Environment Setup

Data Preparation

🏋️ Training

📁 Pre-trained Models

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages