Skip to content

Everything you always wanted to know about ANNS but were afraid to ask 🥰

License

Notifications You must be signed in to change notification settings

AcKing-Sam/Awesome-Vector-Search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 

Repository files navigation

Awesome-Vector-ANNS

Everything you always wanted to know about ANNS but were afraid to ask 🥰 This repo is going to update frequently. Welcome any advice or questions, feel free to send emails to connect with me.

  • means theoretical papers.

Papers

GPU-accelerated

[ICDE'24] CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs. blog

Information Retrieval

[ICLR'21] Approximate Nearest Neighbor Negative Contrastive Learning for Dense Text Retrieval. blog

[SIGIR'24] Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations.

[CIKM'24] Pairing Clustered Inverted Indexes with 𝜅-NN Graphs for Fast Approximate Retrieval over Learned Sparse Representations.

[KDD'20] Embedding-based Retrieval in Facebook Search. blog

Disk or Second-tier Memory

[ATC'24] Scalable Billion-point Approximate Nearest Neighbor Search Using SmartSSDs.

[SIGMOD'24] Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment.

[CIKM'19] GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine. blog

[ArXiv'24] Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory.

[NIPS'20] HM-ANN: Efficient Billion-Point Nearest Neighbor Search on Heterogeneous Memory. blog

[BD'23] LM-DiskANN: Low Memory Footprint in Disk-Native Dynamic Graph-Based ANN Indexing.

Multi-core

[PPoPP'23] iQAN : Fast and Accurate Vector Search with Efficient Intra-Query Parallelism on Multi-Core Architectures.

[PPoPP'24] ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search Algorithms.

Learned

[SIGMOD'18] The Case For Learned Index Structures.

[VLDB'23] Learned Index: A Comprehensive Experimental Evaluation. blog

[ICLR'20] Learning Space Partitions for Nearest Neighbor Search. blog

[TPAMI'19] Learning to Index for Nearest Neighbor Search.

[ICML'19] Learning to Route in Similarity Graphs.

[SIGIR'24] A Learning-to-Rank Formulation of Clustering-Based Approximate Nearest Neighbor Search.

[DAC'24] Leanor: A Learning-Based Accelerator for Efficient Approximate Nearest Neighbor Search via Reduced Memory Access.

[NIPS'22] A Multilabel Classification Framework for Approximate Nearest Neighbor Search.

[NIPS'24] LoRANN: Low-rank matrix factorization for approximate nearest neighbor search.

[VLDB'23] LIDER: An Efficient High-dimensional Learned Index for Large-scale Dense Passage Retrieval.

[SIGKDD'22] BLISS: A Billion scale Index using Iterative Re-partitioning.

[SIGKDD'23] Learning Balanced Tree Indexes for Large-Scale Vector Retrieval. blog

Knowledge Distillation of Indexes

[NIPS'23] Knowledge Distillation for High Dimensional Search Index.

Learned Representation of Vectors

[NIPS'23] AdANNS: A Framework for Adaptive Semantic Search.

[NIPS'22] Matryoshka Representation Learning.

LSH

[TKDE'19] A Revisit of Hashing Algorithms for Approximate Nearest Neighbor Search.

[NIPS'15] Practical and Optimal LSH for Angular Distance. blog

[STOC'15] Optimal Data-Dependent Hashing for Approximate Near Neighbors. talk

[VLDB'07] Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search. blog

[VLDB'24] DET-LSH: A Locality-Sensitive Hashing Scheme with Dynamic Encoding Tree for Approximate Nearest Neighbor Search.

Graph

[WWW'23] FINGER: Fast Inference for Graph-based Approximate Nearest Neighbor Search. blog

[WWW'11] Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures. blog

[PR'19] Hierarchical Clustering-Based Graphs for Large Scale Approximate Nearest Neighbor Search. blog own blog

[WSDM'22] GraSP: Optimizing Graph-based Nearest Neighbor Search with Subgraph Sampling and Pruning. blog

[VLDB'22] HVS: Hierarchical Graph Structure Based on Voronoi Diagrams for Solving Approximate Nearest Neighbor Search. blog

[MM'23] Relative NN-Descent: A Fast Index Construction for Graph-Based Approximate Nearest Neighbor Search.

[VLDB'21] A Comprehensive Survey and Experimental Comparison of Graph-Based Approximate Nearest Neighbor Search.

[ICMR'24] An Exploration Graph with Continuous Refinement for Efficient Multimedia Retrieval.

[arxiv'24] Revisiting the Index Construction of Proximity Graph-Based Approximate Nearest Neighbor Search.

[CVPR'18] Link and code: Fast indexing with graphs and compact regression codes. blog

[NIPS'24] *Navigable Graphs for High-Dimensional Nearest Neighbor Search: Constructions and Limits.

[arxiv'16] EFANNA : An Extremely Fast Approximate Nearest Neighbor Search Algorithm Based on kNN Graph. blog

Quantization

[CVPR'12] The Inverted Multi-Index. blog1 blog2

[TPAMI'10] Product Quantization for Nearest Neighbor Search.

[CVPR'14] Additive Quantization for Extreme Vector Compression.

[arxiv'18] A Survey of Product Quantization. Mainly talked about OPQ, AQ and CQ

[VLDB'18] HD-Index: Pushing the Scalability-Accuracy Boundary for Approximate kNN Search in High-Dimensional Spaces. blog

Tree

OOD

[VLDB'24] RoarGraph: A Projected Bipartite Graph for Efficient Cross-Modal Approximate Nearest Neighbor Search.

MIPS (Maximum Inner Product Search)

[ICDE'24] Efficient Approximate Maximum Inner Product Search over Sparse Vectors.

[AAAI'20] Understanding and Improving Proximity Graph based Maximum Inner Product Search.

Good blogs

https://zhuanlan.zhihu.com/p/133526632

Books

High-Dimensional Probability and Applications in Data Science. link

Foundations of Vector Retrieval. link

Blogs & Talks & Tutorials

Tools

tmux tutorial

CMake tutorial

SIMD

SIMD Programming(A little out of date, using VMX and MMX)

CUDA

https://hpcwiki.io/gpu/cuda/

https://face2ai.com/program-blog/#GPU%E7%BC%96%E7%A8%8B%EF%BC%88CUDA%EF%BC%89

C++

https://eecs280staff.github.io/notes/

https://changkun.de/modern-cpp/

C++ Concurrency

Effective Modern C++

C++ Memory Model.blog

C++ Concurrency.blog

Linear Algebra

SVD

Implementation

ANN-Benchmark

https://ann-benchmarks.com

Good blogs for hnsw source code analysis:

About

Everything you always wanted to know about ANNS but were afraid to ask 🥰

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published