About

🌟 What is Spector?

The Zero-Overhead, Agent-Ready AI Memory Backbone.

Legacy search engines bolted vectors onto text databases. Spector is designed from the ground up for modern AI — combining vector similarity, keyword search, and hybrid ranking in a single embeddable library with zero external dependencies. Connect any AI agent via the built-in MCP server, or embed directly in your application.

Spector is an open-source, high-performance search engine built entirely on modern Java 25. It's designed for developers who want sub-millisecond search, native AI agent integration, and zero infrastructure complexity. Drop in a JAR, write a few lines of code, and you have production-grade hybrid search with built-in agent support.

🎯 What It Does

Spector indexes documents with their vector embeddings and text content, then retrieves them using multiple strategies — directly from AI agents or your application code:

graph LR
    subgraph Clients
        MCP["🤖 AI Agent (MCP)"]
        REST["🌐 REST API"]
        SDK["📦 Java SDK"]
    end
    
    subgraph Search Modes
        A[Vector Search] --> D[Results]
        B[Keyword Search] --> D
        C[Hybrid Search] --> D
    end
    
    subgraph Engines
        A --> E[HNSW ANN]
        B --> F[BM25 Scoring]
        C --> E
        C --> F
        C --> G[RRF Fusion]
    end
    
    MCP --> A & B & C
    REST --> A & B & C
    SDK --> A & B & C

Mode	How It Works	Best For
🧠 Vector Search	HNSW approximate nearest neighbor graphs	Semantic similarity
📝 Keyword Search	BM25 scoring with term frequency saturation	Exact term matching
🧬 Hybrid Search	Combines both via Reciprocal Rank Fusion	Best-of-both-worlds
🤖 RAG Pipeline	Ingest → chunk → embed → retrieve → context assembly	LLM applications
🏛️ SpectorIndex	IVF-HNSW-SVASQ adaptive hybrid index	Scale + recall

💎 Key Differentiators

🤖 Agent-Native (MCP Protocol)

Includes a built-in Model Context Protocol server with 6 tools. AI agents connect directly via JSON-RPC — no Python frameworks, no network round-trips.

Feature	Python Vector DB MCP	Spector MCP
Search latency	2–10ms	88µs p50 (23–113× faster) †
Network overhead	HTTP/gRPC round-trip	Zero (in-process)
Concurrent queries	Limited by Python GIL	61,000 QPS †
Dependencies	Python framework stack	Single JAR

† Measured. See Benchmarks.

Tip

See the MCP Server Guide to connect Claude Desktop, Cursor, or any MCP client in minutes.

📦 Pure Java, Zero Dependencies

Unlike most vector databases that rely on C++, Rust, or Python bindings, Spector is 100% Java. It uses the JDK's own Vector API for SIMD acceleration — no JNI, no native libraries, no external infrastructure.

Tip

Add the JAR to your classpath and you're done. No Docker, no clusters, no ops.

🚀 Modern JVM Technologies

Technology	Purpose
Java Vector API	SIMD-accelerated math (AVX2/AVX-512/NEON)
Panama FFM	Zero-copy memory-mapped storage, GPU interop
Virtual Threads	Millions of concurrent operations without thread pools
Structured Concurrency	Safe parallel task management

⚡ Sub-Millisecond at Scale

HNSW at 100K documents (128 dimensions, top-10, M=16, efSearch=64):

Search Type	Average Latency	Throughput
Vector	0.13 ms	7,556 QPS
Keyword	0.98 ms	1,019 QPS
Hybrid	1.01 ms	994 QPS

SpectorIndex (IVF-HNSW-SVASQ) at 10K documents (4096-dim real Qwen3 embeddings):

Config	Average Latency	Throughput	Recall@10
nCentroids=128, nProbe=4	0.46 ms	2,173 QPS	1.0000
nCentroids=64, nProbe=4	0.62 ms	1,601 QPS	1.0000
nCentroids=128, nProbe=16	1.26 ms	792 QPS	1.0000

Note

SpectorIndex achieves perfect recall while searching only 3.1% of the data (nProbe=4 out of 128 centroids). Ingestion is 28–160× faster than standalone HNSW. Numbers measured on 24-core x86, AVX2, Java 25, ZGC with Qwen3-embedding real vectors. For comprehensive, multi-centroid sweeps and adaptive HNSW shard promotion benchmarks, see the dedicated Large-Scale Real-Embedding Benchmarks page.

🏠 Dual Deployment Modes

Mode	Description	Best For
Embedded	In-process library, zero network overhead	Microservices, desktop apps, edge
Server	REST API with CORS, auth, and metrics	Teams, multi-language clients

🗜️ Advanced Quantization (SVASQ + IVF-PQ)

Spector offers two quantization paths:

SVASQ (Vectorized Affine Scalar Quantization): Uses the Fast Walsh-Hadamard Transform to spread variance before INT8 quantization, achieving 4× compression with near-lossless recall (~97–99.5%). Used inside SpectorIndex shards.
IVF-PQ (Product Quantization): Provides 32× memory compression for billion-scale datasets.

Important

SVASQ gives INT8 the precision of INT12–16 by rotating vectors before quantization. See the SVASQ Deep Dive for the full theory.

📊 How Spector Compares

Latency Comparison (100K docs, 128-dim, top-10)

Engine	Language	Vector Avg	Vector P99
⚡ Spector	Java 25	0.13 ms	0.26 ms
hnswlib	C++	0.1–0.5 ms	~1 ms
FAISS	C++	0.2–0.8 ms	1–2 ms
Lucene 9+	Java	1–5 ms	5–10 ms
Elasticsearch 8+	Java	2–10 ms	10–25 ms
Qdrant	Rust	2–5 ms	10–25 ms
Milvus	Go/C++	3–10 ms	10–35 ms

Note

Spector's vector search latency is competitive with native C++ implementations (hnswlib, FAISS) for in-process workloads. Numbers for external systems are from published benchmarks and ann-benchmarks.com. Hardware and configuration differences apply — these are directional comparisons, not controlled A/B tests.

Feature Comparison

Feature	Spector	Elasticsearch	Qdrant	Milvus	hnswlib
Deployment	Embedded + Server	Cluster only	Server only	Cluster only	Embedded only
MCP Server	✅ Built-in (6 tools)	❌	❌	❌	❌
Hybrid Search	✅ RRF built-in	✅ RRF	✅ Sparse+Dense	✅ RRF	❌
Zero Dependencies	✅ JDK only	❌ Heavy stack	❌ Tokio runtime	❌ etcd, MinIO, Pulsar	✅ Header-only
Virtual Threads	✅ Project Loom	❌ Platform threads	N/A (Rust async)	N/A (Go goroutines)	N/A
GPU Acceleration	✅ CUDA (Panama FFM)	❌	✅ Vulkan (indexing)	✅ CUDA (search + indexing)	❌
Quantization	✅ Scalar INT8 + IVF-PQ	✅ BBQ + Scalar + DiskBBQ (IVF)	✅ Scalar + Binary	✅ IVF-PQ + IVF-SQ	❌
Re-ranking	✅ LLM via Ollama	✅ Elastic Rerank + Inference API	✅ FastEmbed / ColBERT	✅ vLLM Ranker + Cross-encoder	❌
Distributed	✅ gRPC fan-out	✅ Built-in sharding	✅ Raft consensus	✅ gRPC + etcd	❌
SIMD Acceleration	✅ Java Vector API	✅ simdvec (Panama)	✅ Native SIMD	✅ AVX/NEON	✅ AVX/SSE

Note

This comparison reflects publicly available information as of May 2025. Feature availability may vary by version and deployment mode. All products are actively evolving.

🛠️ Use Cases

🤖 Agentic AI Memory

Connect AI agents (Claude, Cursor, custom) directly to Spector via the built-in MCP server. The agent autonomously ingests documents, searches for relevant context, and retrieves information — all with zero Python glue-code. "Point your LLM at Spector's MCP port, and it instantly has mathematically-perfect long-term memory."

🤖 Retrieval-Augmented Generation (RAG)

Ingest documents (PDF, HTML, Markdown), chunk them with token awareness, generate embeddings, and retrieve relevant context for LLM prompting — all through a single /api/v1/rag endpoint or the rag_query MCP tool.

🔍 Semantic Search Applications

Power product search, documentation search, code search, or any application where meaning matters more than exact keywords.

💡 Recommendation Systems

Use vector similarity to find items similar to what users have engaged with. Sub-millisecond latency makes real-time recommendations practical.

🏢 Hybrid Enterprise Search

Combine keyword precision (finding exact product SKUs, error codes) with semantic understanding (finding conceptually related documents).

📱 Embedded Analytics

Drop Spector into existing Java applications without infrastructure changes. Perfect for desktop applications, microservices, or edge deployments.

✅ When to Choose Spector

Note

Choose Spector when:

You want AI agents to autonomously search your data (MCP integration)
You want sub-millisecond hybrid search without infrastructure complexity
Your stack is Java/JVM and you want native integration
You need an embedded search library with server-mode option
You want GPU acceleration without leaving the JVM
Zero external dependencies matters to your deployment

Warning

Consider alternatives when:

You need a managed cloud service with zero ops
Your team primarily works in Python/Rust/Go
You need built-in ML model serving

🚀 Next Steps

Getting Started — Build and run your first search in 5 minutes
MCP Server Guide — Connect an AI agent in 3 steps
Architecture Overview — Understand how it works under the hood
REST API Reference — Full API documentation
Core Concepts — Deep dive into the algorithms

🏠 Home

Home
About
Getting Started
Architecture
Deep Dives
🧠 Cognitive Memory
- Overview
- Getting Started
- Architecture
  - System Architecture
  - The 6-Phase Scoring Pipeline
- Biological Systems
- Advanced Profiles
- Deep Dives
- API Reference
🧬 Cortex Dashboard
- Overview
Reference
Operations
- Performance Tuning
- Contributing
FAQ
Roadmap
🔬 Labs
- Labs
- Research Roadmap

Uh oh!

About

🌟 What is Spector?

🎯 What It Does

💎 Key Differentiators

🤖 Agent-Native (MCP Protocol)

📦 Pure Java, Zero Dependencies

🚀 Modern JVM Technologies

⚡ Sub-Millisecond at Scale

🏠 Dual Deployment Modes

🗜️ Advanced Quantization (SVASQ + IVF-PQ)

📊 How Spector Compares

Latency Comparison (100K docs, 128-dim, top-10)

Feature Comparison

🛠️ Use Cases

🤖 Agentic AI Memory

🤖 Retrieval-Augmented Generation (RAG)

🔍 Semantic Search Applications

💡 Recommendation Systems

🏢 Hybrid Enterprise Search

📱 Embedded Analytics

✅ When to Choose Spector

🚀 Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!