Everything you always wanted to know about ANNS but were afraid to ask 🥰 This repo is going to update frequently. Welcome any advice or questions, feel free to send emails to connect with me.
- means theoretical papers.
[ICDE'24] CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. blog
[ICLR'21] Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. blog
[SIGIR'24] Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations.
[CIKM'24] Pairing Clustered Inverted Indexes with 𝜅-NN Graphs for Fast Approximate Retrieval over Learned Sparse Representations.
[KDD'20] Embedding-based Retrieval in Facebook Search. blog
[ATC'24] Scalable Billion-point Approximate Nearest Neighbor Search Using SmartSSDs.
[SIGMOD'24] Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment.
[CIKM'19] GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine. blog
[ArXiv'24] Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory.
[NIPS'20] HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. blog
[BD'23] LM-DiskANN: Low Memory Footprint in Disk-Native Dynamic Graph-Based ANN Indexing.
[PPoPP'23] iQAN : Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures.
[PPoPP'24] ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms.
[SIGMOD'18] The Case For Learned Index Structures.
[VLDB'23] Learned Index: A Comprehensive Experimental Evaluation. blog
[ICLR'20] Learning Space Partitions for Nearest Neighbor Search. blog
[TPAMI'19] Learning to Index for Nearest Neighbor Search.
[ICML'19] Learning to Route in Similarity Graphs.
[SIGIR'24] A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search.
[DAC'24] Leanor: A Learning-Based Accelerator for Efficient Approximate Nearest Neighbor Search via Reduced Memory Access.
[NIPS'22] A Multilabel Classification Framework for Approximate Nearest Neighbor Search.
[NIPS'24] LoRANN: Low-rank matrix factorization for approximate nearest neighbor search.
[VLDB'23] LIDER: An Efficient High-dimensional Learned Index for Large-scale Dense Passage Retrieval.
[SIGKDD'22] BLISS: A Billion scale Index using Iterative Re-partitioning.
[SIGKDD'23] Learning Balanced Tree Indexes for Large-Scale Vector Retrieval. blog
[NIPS'23] Knowledge Distillation for High Dimensional Search Index.
[NIPS'23] AdANNS: A Framework for Adaptive Semantic Search.
[NIPS'22] Matryoshka Representation Learning.
[TKDE'19] A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search.
[NIPS'15] Practical and Optimal LSH for Angular Distance. blog
[STOC'15] Optimal Data-Dependent Hashing for Approximate Near Neighbors. talk
[VLDB'07] Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search. blog
[VLDB'24] DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search.
[WWW'23] FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search. blog
[WWW'11] Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures. blog
[PR'19] Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search. blog own blog
[WSDM'22] GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning. blog
[VLDB'22] HVS: Hierarchical Graph Structure Based on Voronoi Diagrams for Solving Approximate Nearest Neighbor Search. blog
[MM'23] Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search.
[VLDB'21] A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search.
[ICMR'24] An Exploration Graph with Continuous Refinement for Efficient Multimedia Retrieval.
[arxiv'24] Revisiting the Index Construction of Proximity Graph-Based Approximate Nearest Neighbor Search.
[CVPR'18] Link and code: Fast indexing with graphs and compact regression codes. blog
[NIPS'24] *Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits.
[arxiv'16] EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph. blog
[CVPR'12] The Inverted Multi-Index. blog1 blog2
[TPAMI'10] Product Quantization for Nearest Neighbor Search.
[CVPR'14] Additive Quantization for Extreme Vector Compression.
[arxiv'18] A Survey of Product Quantization. Mainly talked about OPQ, AQ and CQ
[VLDB'18] HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces. blog
[VLDB'24] RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search.
[ICDE'24] Efficient Approximate Maximum Inner Product Search over Sparse Vectors.
[AAAI'20] Understanding and Improving Proximity Graph based Maximum Inner Product Search.
https://zhuanlan.zhihu.com/p/133526632
High-Dimensional Probability and Applications in Data Science. link
Foundations of Vector Retrieval. link
SIMD Programming(A little out of date, using VMX and MMX)
https://face2ai.com/program-blog/#GPU%E7%BC%96%E7%A8%8B%EF%BC%88CUDA%EF%BC%89
https://eecs280staff.github.io/notes/
https://changkun.de/modern-cpp/
Good blogs for hnsw source code analysis: