This repository contains the source code for our SIGIR2025 paper: Stitching Inner Product and Euclidean Metrics for Topology-aware Maximum Inner Product Search.
If you're interested, you can also check out our theoretical work: VLDB2025 PSP
Our investigation, grounded in graph-based search, reveals that different indexing and search strategies offer distinct advantages for MIPS, depending on the underlying data topology. Building on these insights, we introduce a novel graph-based index called Metric-Amphibious Graph (MAG) and a corresponding search algorithm, Adaptive Navigation with Metric Switch (ANMS). To facilitate parameter tuning for optimal performance, we identify three statistical indicators that capture essential data topology properties and correlate strongly with parameter tuning.
| Dataset | Base | Dim. | Query | Modality | DBI(Euc.) | DBI(Cos.) | CV |
|---|---|---|---|---|---|---|---|
| Music100 | 1M | 100 | 10,000 | Audio | 1.5 | 2.8 | 0.25 |
| YFCC1M | 1M | 100 | 1,000 | Multi | 1.51 | 2.9 | 0.07 |
| SIFT1M | 1M | 128 | 1,000 | Image | 3.26 | 2.6 | 0.001 |
| Text2Image1M | 1M | 200 | 100,000 | Multi | 2.5 | 3.0 | 0.03 |
| MNIST1M | 1M | 784 | 10,000 | Image | 2.7 | 2.8 | 0.18 |
| GIST1M | 1M | 960 | 1,000 | Image | 6.28 | 3.2 | 0.27 |
| OpenAI-1536 | 1M | 1536 | 1,000 | Text | 4.1 | 5.3 | 0.0 |
| Imagenet-1k | 1.3M | 1536 | 1,000 | Image | 1 | 1.4 | 0.36 |
| Color3M | 3M | 282 | 1,000 | Image | 2.6 | 2.1 | 0.17 |
| Shopee10M | 10M | 48 | 1,000 | E-commerce | 2.4 | 2.1 | 0.24 |
| Text2Image10M | 10M | 200 | 100,000 | Multi | 3.3 | 3.6 | 0.03 |
| Laion10M | 12M | 512 | 1,000 | Multi | 4.3 | 3.6 | 0.0 |
- GCC 4.9+ with OpenMP
- CMake 2.8+
- Boost 1.55+
- Faiss (optional)
$ mkdir build/ && cd build/
$ cmake ..
$ make -j- datasets: datasets
- include: Main file
- benchmark: Store index and log
- script: some scripts to run the experiments
- test: test codes
Firstly, we need to prepare a kNN graph. You can use Faiss and other libs.
./test/test_mag_index DATA_PATH KNNG_PATH L R C INDEX_PATH MODE DIM R_IP M THRESHOLDDATA_PATHis the path of the base data inbinformat.KNNG_PATHis the path of the pre-built kNN graph in Step 1..Linital search pool size.Rmaximum out-degree for NN.Ccandidate pool size.INDEX_PATHis the path of the generated MAG index.MODEindex.DIMdimension of dataset.R_IPmaximum out-degree for IP.Mmaximum out-degree.THRESHOLDip threshold for approximation (Strictly = 1, but > 1 works better).
./test/test_mips_search DATA_PATH QUERY_PATH INDEX_PATH searh_L K RESULT_PATH MODE DIMDATA_PATHis the path of the base data inbinformat.QUERY_PATHis the path of the query data inbinformat.INDEX_PATHis the path of the generated MAG index.search_Lsearch pool size, the larger the better but slower (must larger than K).Kthe result size.MODEsearch.DIMdimension of dataset.
- ✅ Open-source code for the major components of the original paper
- ✅ Support for a wider range of datasets with diverse norm distributions
- 🔄 Vector compression acceleration (coming soon)
- 🔄 Hierarchical graph structure for inner product (coming soon)
- 🔄 Partition algorithm integration (coming soon)
- 🔄 Adaptive early stopping strategy (coming soon)
- 🔄 Python wrapper.
- QPS
@inproceedings{chen2025stitching,
title={Stitching Inner Product and Euclidean Metrics for Topology-aware Maximum Inner Product Search},
author={Chen, Tingyang and Fu, Cong and Ke, Xiangyu and Gao, Yunjun and Ni, Yabo and Zeng, Anxiang},
booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
pages={2341--2350},
year={2025}
}If you're interested in our work and future collaboration opportunities, feel free to reach out to Tingyang Chen at [email protected] or Cong Fu at [email protected]
