-
-
Notifications
You must be signed in to change notification settings - Fork 0
FAQ
Quick answers to the most common questions about Spector. Can't find what you're looking for? Check GitHub Discussions or the specific wiki pages linked throughout.
JDK 25 or later. Spector uses the Java Vector API (incubator module) for SIMD acceleration and Panama FFM for off-heap memory. OpenJDK builds include these by default.
Yes, completely. GPU is optional. Without a GPU, Spector uses CPU SIMD acceleration (AVX2/AVX-512/NEON) which delivers sub-millisecond search at 100K documents. GPU helps primarily for high-concurrency batch workloads.
Tip
See GPU Acceleration for details on when GPU adds value (spoiler: batch sizes > 32).
Absolutely! Spector runs in two modes:
| Mode | Description | Overhead |
|---|---|---|
| Embedded | Add JAR to classpath, create SpectorEngine
|
Zero network overhead |
| Server | REST API with auth, CORS, metrics | HTTP overhead |
try (var engine = new SpectorEngine(SpectorConfig.DEFAULT.withDimensions(384))) {
engine.ingest("id", "content", vector);
var results = engine.hybridSearch("query", queryVector, 10);
}No! Spector supports persistence through memory-mapped files. The HNSW index uses a page-aligned binary format that loads instantly via mmap — no deserialization needed. Vector data survives restarts.
| Aspect | ⚡ Spector | Elasticsearch |
|---|---|---|
| Vector search latency | 0.13 ms (100K, in-process) | 2–10 ms |
| Hybrid search latency | 1.01 ms (100K, in-process) | 10–30 ms |
| Deployment | Embedded JAR or server | Cluster only |
| Dependencies | Zero (JDK only) | JVM + heavy stack |
| GPU support | ✅ CUDA | ❌ |
| IVF-PQ compression | ✅ 32× | ❌ |
Elasticsearch excels at distributed full-text search with a mature query language and ecosystem. Spector excels at raw in-process performance, embedded use, and modern JVM features. The latency advantage is largest for in-process embedded use; network-bound deployments narrow the gap.
Yes. The Spring AI integration supports filter expressions:
vectorStore.similaritySearch(
SearchRequest.query("search algorithms")
.withFilterExpression("category == 'indexing' && version > 2")
);Any model that produces float32 vectors. Set dimensions to match:
| Model | Dimensions | Provider |
|---|---|---|
| all-MiniLM-L6-v2 | 384 | Sentence Transformers / Ollama |
| e5-base-v2 | 768 | Sentence Transformers |
| text-embedding-ada-002 | 1536 | OpenAI |
| nomic-embed-text | 768 | Ollama |
| mxbai-embed-large | 1024 | Ollama |
Note
Spector includes an Ollama embedding provider out of the box. Implement the EmbeddingProvider SPI for any other source.
| Function | Best For |
|---|---|
| COSINE (default) | Normalized embeddings (most models) |
| DOT_PRODUCT | Unnormalized embeddings, magnitude matters |
| EUCLIDEAN | Spatial/geometric data |
| Mode | Scale |
|---|---|
| Single node | Up to 10 million documents |
| IVF-PQ mode | Billions of vectors (32× compression) |
| Distributed mode | Scale horizontally (2–256 shards) |
flowchart LR
A["🔍 Search<br/>Top-N candidates"] --> B["🤖 LLM (Ollama)<br/>Listwise scoring"]
B --> C["✨ Re-ranked<br/>Top-K results"]
- Vector/hybrid search retrieves top-N candidates (default: 20)
- Candidates sent to Ollama for listwise relevance scoring
- LLM reorders based on semantic relevance
- Final top-K results reflect LLM judgment
Warning
Adds 100–500ms latency but significantly improves precision for ambiguous queries.
Virtual threads (Project Loom) are lightweight threads that don't map 1:1 to OS threads:
-
✅ Handle millions of concurrent requests without pool tuning
-
✅ No
synchronizedblocks that pin platform threads -
✅ Near-zero scheduling overhead
-
✅ Linear scaling (4.5× at 16 threads measured)
Vectors are stored in memory-mapped files using Panama's MemorySegment:
-
OS maps file directly into process address space
-
SIMD kernels read vectors without copying to Java heap
-
Zero garbage collection pressure
-
Instant startup (no deserialization)
-
Supports datasets larger than available RAM
| Aspect | 🌐 HNSW | 🗜️ IVF-PQ |
|---|---|---|
| Speed | Fastest (0.05ms) | Fast (nprobe-dependent) |
| Memory | Full vectors (1.5KB/vec @ 384-dim) | 32× compressed (48 bytes/vec) |
| Recall | High (configurable) | Moderate (nprobe-dependent) |
| Scale | Up to millions | Up to billions |
| Use case | Default for most workloads | Memory-constrained, billion-scale |
Yes! JSON output + baseline regression detection:
mvn -pl spector-bench exec:java -Dexec.args="-rf json -rff results.json"| Port | Protocol | Purpose |
|---|---|---|
| 7070 | HTTP | REST API (configurable) |
| 9090 | gRPC | Cluster communication (distributed mode) |
curl http://localhost:7070/health # Health check
curl http://localhost:7070/api/v1/status # Engine status
curl http://localhost:7070/api/v1/metrics # Request metricsjava \
--add-modules jdk.incubator.vector \
--enable-native-access=ALL-UNNAMED \
-XX:+UseZGC -XX:+ZGenerational \
-Xmx4g -Xms4g \
-jar spector-node.jarDistributed mode:
- Drain one node (stop routing requests)
- Upgrade the node binary
- Restart and wait for replica sync
- Repeat for each node
Embedded mode: Standard application deployment with new Spector version.
Yes. Set an API key at server startup:
mvn exec:java -pl spector-node \
-Dexec.args="7070 384 my-secret-key"Clients include X-API-Key: my-secret-key in requests. Without a key configured, all requests are allowed.
-
Getting Started — Quick start guide
-
What is Spector — Product overview
-
Configuration Guide — All parameters
-
Performance Tuning — Optimization strategies
- Home
- About
- Getting Started
-
Architecture
- Architecture--Overview
- Architecture--Core-Concepts
- Architecture--Mcp-Integration
- Architecture--Ingestion-Pipeline
- Architecture--Rag-Pipeline
- Architecture--Distributed-Mode
- Architecture--Gpu-Acceleration
-
Modules
- Modules
- Modules--Spector-Core
- Modules--Spector-Commons
- Modules--Spector-Config
- Modules--Spector-Storage
- Modules--Spector-Embed-Api
- Modules--Spector-Embed-Ollama
- Modules--Spector-Index
- Modules--Spector-Query
- Modules--Spector-Gpu
- Modules--Spector-Rag
- Modules--Spector-Engine
- Modules--Spector-Ingestion
- Modules--Spector-Memory
- Modules--Spector-Runtime
- Modules--Spector-Node
- Modules--Spector-Mcp
- Modules--Spector-Cli
- Modules--Spector-Client
- Modules--Spector-Spring
- Modules--Spector-Metrics
- Modules--Spector-Bench
- Modules--Spector-Dist
- Modules--Spector-Cortex
-
Deep Dives
- Deep-Dives--Ann-Search-Primer
- Deep-Dives--Hnsw-Explained
- Deep-Dives--Spector-Index-Architecture
- Deep-Dives--Svasq-Deep-Dive
- Deep-Dives--Understanding-Quantization
- Deep-Dives--Quantization-Comparison
- Deep-Dives--Turbo-Quant
- Deep-Dives--Real-Embedding-Benchmarks
- Deep-Dives--Svasq-Spectorindex-Whitepaper
-
🧠 Cognitive Memory
- Memory
- Memory--Getting-Started
- Architecture
- Biological Systems
- Advanced Profiles
- Deep Dives
- Memory--Api-Reference
- 🧬 Cortex Dashboard
- Reference
- Operations
- FAQ
- Roadmap
- 🔬 Labs