Releases: thustorage/PipeANN
Releases · thustorage/PipeANN
0.3.0
It adds new search capabilities first, then the Python-facing stack built on top of them, plus the refactors and tests needed to support updates and filtering consistently.
Feature highlights
Speculative filtering
- Added speculative filtered ANNS for arbitrary attribute constraints.
- Uses lightweight in-memory probabilistic filters to explore a superset of valid vectors, then verifies final candidates exactly against SSD-resident attributes.
- A cost model chooses speculative pre-filtering, speculative in-filtering, or post-filtering per query.
- Supports label filters, range filters
[l, r), and Boolean combinations through native selectors. - Added typed attributes (
Attributes,AttrsVec), on-disk attribute indexes, dense-neighbor support (range_dense), and JSON filter config loading.
OOD refinement
- Added NGFix-style out-of-distribution graph refinement.
- Exposed
train_query_path,R_ood, andL_oodin C++ build tools andIndexPipeANN.build(). - Persisted OOD metadata in SSD index metadata.
Range search
- Added finite-threshold range search in C++ and Python.
- Results outside the threshold are filtered out and padded with
UINT32_MAX/inf. - Reuses the common pipelined traversal and result-copy path.
SPDK backend
- Added an optional SPDK I/O backend through
-DIO_ENGINE=spdkfor raw NVMe vector reads. - Supports RAID-0-style striping across PCIe NVMe devices listed in
spdk_bdevs.json, with one poller thread per device. - Copies
{index_prefix}_disk.indexto the SPDK target on first open and reuses a marker to skip repeated copies. - Keeps filtered-search attribute reads on
io_uringwhile vector I/O uses SPDK.
Python API and integrations
- Added the current
IndexPipeANN(data_dim, data_type, metric)API for build, load, search, insert, delete, save, filters, range search, and attribute-aware inserts. - Added
CollectionandClientfor SQLite-backed documents/payloads, vector CRUD, persistence, and collection auto-discovery. - Added LangChain integration through
pipeann.langchain.PipeANNVectorStore. - Added a Qdrant-compatible FastAPI server with collection management, point upsert/query/scroll, payload indexes, filter delete, count, and save.
- Added
schema.jsonpersistence for collection config and attribute-index metadata.
Code refactoring and implementation changes
- Merged dynamic search, insert, delete, merge, and save behavior into header-only
DynamicIndex<T>. - Unified update save/merge with same-prefix double-version replacement.
- Merged PiPNN and Vamana-style build paths behind
build_disk_indexand the shared SSD file format. - Refactored pipelined top-k, range, and filtered search around
pipe_search_common.handspec_filter_search.cpp. - Simplified metric/distance handling and updated PiPNN file layout.
- Added
IO_ENGINEselection foruring,aio, andspdk, separate dense-node I/O sizing, CMake/CI cleanup, and package version0.3.0.
Tests and examples
- Switched the project license from MIT to Apache License 2.0 and updated NOTICE attribution, including NGFix graph-refinement logic under MIT.
- Added Python insert/delete search tests, including assertions that results include inserted data and exclude deleted data.
- Added filtered insert regression tests comparing full build vs. build-plus-insert with attributes.
- Added native selector, range-search, collection, LangChain, and Qdrant server tests/examples.
- Added C++ filtered build/search and update test coverage.
Breaking changes
DynamicSSDIndexwas removed; useDynamicIndex<T>in C++ orIndexPipeANNin Python.dynamic_index.cppwas removed;DynamicIndex<T>is now header-only.filter/label.hwas removed; usefilter/attribute.handfilter/selector.h.build_pipnn_indexwas removed; usebuild_disk_indexwith PiPNN-compatible parameters.- The old schema-centered Python API was removed; construct
IndexPipeANN(data_dim, data_type, metric)directly and letCollection/Clientown collection persistence. PyIndexInterfaceconstruction changed from a Python dict to explicit(dim, dtype, metric)arguments.IndexPipeANN.search()now takesselector,query_attrs, and finiterangekeyword arguments.IndexPipeANN.build()now takesattrs,range_dense,train_query_path,R_ood, andL_ood; major build knobs auto-configure when left at0.pipnn.h/pipnn.cppmoved under the utils layout.
0.2.0
We heavily refactored the codebase and added some features.
Updates
- Python interface for index, search, and update.
- Modularized neighbor abstraction & RaBitQ support.
- Faster PQ generation in graph building.
- Filtered ANNS support (post-filtering).
- Reduced memory usage.
- More metrics supported (L2, cosine, and inner-product).
- Vector size >= 4KB supported (experimental)