Skip to content

Commit 821a08f

Browse files
committed
Refactor persistence around storage-owned HNSW and add Phase 1 write admission
Move canonical vector and metadata ownership into the single-file storage engine and refactor collection-backed HNSW to operate on stable internal ordinals through a provider boundary. HNSW nodes now own topology plus optional derived compressed vectors instead of raw vectors, string IDs, or metadata. Add binary single-file persistence, reopen/rebuild coverage, memory accounting updates, acceptance tests, and OpenClaw memory benchmarks/profile hooks. Follow up with targeted memory reductions across storage, WAL, and HNSW allocation hot paths. Add Phase 1 collection write admission control with bounded queued writers, conservative default write concurrency, queue-full and cancellation handling, and clamped batch/streaming worker concurrency for safer plugin and subagent usage. Update README and design docs to reflect the new single-file persistence model and concurrency behavior.
1 parent fd3de13 commit 821a08f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

65 files changed

+9079
-639
lines changed

README.md

Lines changed: 80 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -45,16 +45,17 @@
4545

4646
## 🎯 Overview
4747

48-
LibraVDB is a high-performance, production-ready vector database library designed specifically for Go applications. Built from the ground up with performance, scalability, and developer experience in mind, it provides enterprise-grade vector similarity search capabilities with support for multiple indexing algorithms, advanced quantization techniques, and sophisticated metadata filtering.
48+
LibraVDB is a high-performance vector database library for Go applications. It provides similarity search, metadata-aware retrieval, batch and streaming ingestion, and persistent single-file storage with HNSW, IVF-PQ, and Flat indexing.
4949

5050
### Why LibraVDB?
5151

5252
- **🚀 Performance First**: Optimized for high-throughput insertions and sub-millisecond search latency
5353
- **🔧 Go Native**: Designed specifically for Go with idiomatic APIs and zero external dependencies
54-
- **📈 Production Ready**: Comprehensive error handling, observability, and recovery mechanisms
54+
- **📈 Durable by Default**: Single-file binary persistence with reopen/rebuild support
5555
- **🧠 Memory Efficient**: Advanced quantization and memory mapping for large-scale deployments
5656
- **🔍 Feature Rich**: Complex filtering, streaming operations, and automatic optimization
5757
- **📊 Observable**: Built-in metrics, health checks, and performance monitoring
58+
- **🛡️ Safer Writes**: Bounded write admission for concurrent, batch, and streaming writers
5859

5960
## ✨ Key Features
6061

@@ -64,16 +65,44 @@ LibraVDB is a high-performance, production-ready vector database library designe
6465
- **Rich Metadata Filtering**: Complex AND/OR/NOT operations with type-safe schemas
6566
- **Streaming Operations**: High-throughput batch processing with backpressure control
6667
- **Memory Management**: Configurable limits, memory mapping, and automatic optimization
67-
- **Persistent Storage**: LSM-tree architecture with Write-Ahead Log for durability
68+
- **Persistent Storage**: Single-file binary storage with WAL-backed durability
69+
- **Storage-Owned HNSW**: Canonical vectors and metadata live in storage; HNSW owns graph topology plus optional compressed artifacts
6870

6971
### Enterprise Features
7072
- **Observability**: Prometheus metrics, health checks, and distributed tracing
7173
- **Error Recovery**: Automatic recovery mechanisms and circuit breakers
7274
- **Performance Monitoring**: Real-time performance metrics and optimization suggestions
73-
- **Concurrent Access**: Thread-safe operations with fine-grained locking
75+
- **Concurrent Access**: Thread-safe operations with bounded write admission and queueing
7476
- **Configuration Management**: Extensive configuration options with validation
7577
- **Documentation**: Comprehensive API documentation and usage guides
7678

79+
## 📦 Persistence Model
80+
81+
LibraVDB persists databases as a single `.libravdb` file.
82+
83+
- Importing the package does not create files.
84+
- A database file is created or opened when you call `libravdb.New(...)`.
85+
- If no path is provided, the default path resolves to `./data.libravdb`.
86+
- `WithStoragePath(...)` should point to a database file such as `./mydb.libravdb`.
87+
- The `.libravdb` file is the portable unit you can move or copy after closing the database.
88+
89+
For HNSW-backed collections:
90+
- canonical raw vectors and metadata are stored once in canonical storage
91+
- HNSW uses internal ordinals and provider-backed vector access
92+
- HNSW nodes do not own raw vectors or metadata
93+
- optional compressed vectors remain index-owned derived data
94+
95+
## 🛡️ Write Concurrency Safety
96+
97+
LibraVDB now includes a Phase 1 write-admission layer intended to make local and plugin-style usage safer.
98+
99+
- direct writes, batch writes, and streaming writes are admitted through a bounded per-collection write gate
100+
- queued writers are bounded instead of piling up indefinitely
101+
- waiting writers respect context cancellation
102+
- batch and streaming worker counts are clamped to collection write parallelism
103+
104+
This improves safety under bursty or subagent-style write traffic, but it is not yet the full adaptive scheduler. If you expect very heavy write concurrency, keep batch and streaming concurrency conservative and prefer one coordinated writer path per collection.
105+
77106
## 📊 Performance Benchmarks
78107

79108
LibraVDB delivers exceptional performance across various workloads and scales:
@@ -146,9 +175,9 @@ import (
146175
)
147176

148177
func main() {
149-
// Create database with optimized settings
178+
// Create a single-file database
150179
db, err := libravdb.New(
151-
libravdb.WithStoragePath("./vector_data"),
180+
libravdb.WithStoragePath("./vector_data.libravdb"),
152181
libravdb.WithMetrics(true),
153182
)
154183
if err != nil {
@@ -254,11 +283,11 @@ collection, err := db.CreateCollection(ctx, "documents",
254283
### High-Throughput Batch Processing
255284

256285
```go
257-
// Configure for maximum throughput
286+
// Configure for controlled throughput
258287
opts := &libravdb.StreamingOptions{
259288
BufferSize: 50000,
260289
ChunkSize: 5000,
261-
MaxConcurrency: runtime.NumCPU(),
290+
MaxConcurrency: 2,
262291
Timeout: 5 * time.Minute,
263292
ProgressCallback: func(stats *libravdb.StreamingStats) {
264293
fmt.Printf("Processed: %d/%d (%.1f%%), Rate: %.0f/sec\n",
@@ -271,7 +300,7 @@ opts := &libravdb.StreamingOptions{
271300
stream := collection.NewStreamingBatchInsert(opts)
272301
stream.Start()
273302

274-
// Process millions of vectors efficiently
303+
// Process large numbers of vectors without unbounded writer fan-out
275304
for _, entry := range millionVectorDataset {
276305
stream.Send(entry)
277306
}
@@ -338,7 +367,7 @@ LibraVDB employs a layered architecture designed for performance, scalability, a
338367
│ HNSW │ IVF-PQ │ Flat │ Quantization │ Cache │ Monitoring │
339368
├─────────────────────────────────────────────────────────────────┤
340369
│ Storage Layer │
341-
LSM Engine WAL │ Segments
370+
Single-File Engine │ Canonical Records │ WAL / Snapshot
342371
├─────────────────────────────────────────────────────────────────┤
343372
│ Operating System │
344373
└─────────────────────────────────────────────────────────────────┘
@@ -347,9 +376,9 @@ LibraVDB employs a layered architecture designed for performance, scalability, a
347376
### Key Components
348377

349378
- **Database Layer**: Collection management, global configuration, health monitoring
350-
- **Collection Layer**: Vector operations, metadata management, index coordination
379+
- **Collection Layer**: Vector operations, metadata management, index coordination, write admission
351380
- **Index Layer**: HNSW, IVF-PQ, and Flat algorithms with automatic selection
352-
- **Storage Layer**: LSM-tree architecture with WAL for durability and performance
381+
- **Storage Layer**: Single-file canonical storage with WAL-backed durability and reopen/rebuild support
353382
- **Memory Layer**: Advanced memory management with limits, mapping, and optimization
354383
- **Observability Layer**: Metrics, tracing, health checks, and performance monitoring
355384

@@ -372,7 +401,7 @@ Detailed architecture documentation: [docs/design/architecture.md](docs/design/a
372401
### Advanced Topics
373402
- [**Architecture Design**](docs/design/architecture.md) - System architecture and component design
374403
- [**HNSW Implementation**](docs/design/hnsw.md) - Detailed HNSW algorithm implementation
375-
- [**Storage Design**](docs/design/storage.md) - LSM-tree storage architecture
404+
- [**Storage Design**](docs/design/storage.md) - Single-file storage architecture
376405
- [**API Design**](docs/design/api.md) - API design principles and patterns
377406

378407
### Examples & Tutorials
@@ -387,7 +416,7 @@ Detailed architecture documentation: [docs/design/architecture.md](docs/design/a
387416
- **Go 1.25+**: LibraVDB requires Go 1.25 or later
388417
- **Memory**: Minimum 1GB RAM (4GB+ recommended for production)
389418
- **Storage**: SSD recommended for optimal performance
390-
- **CPU**: Multi-core processor recommended for parallel operations
419+
- **CPU**: Multi-core processor recommended for search and controlled batch ingestion
391420

392421
### Install via Go Modules
393422

@@ -433,10 +462,12 @@ LibraVDB provides extensive configuration options for optimal performance:
433462

434463
```go
435464
db, err := libravdb.New(
436-
libravdb.WithStoragePath("/var/lib/libravdb"), // Production storage path
465+
libravdb.WithStoragePath("/var/lib/libravdb/data.libravdb"), // Production database file
437466
libravdb.WithMetrics(true), // Enable Prometheus metrics
438467
libravdb.WithTracing(true), // Enable distributed tracing
439468
libravdb.WithMaxCollections(1000), // Maximum collections
469+
libravdb.WithMaxConcurrentWrites(2), // Conservative write parallelism
470+
libravdb.WithMaxWriteQueueDepth(32), // Bound queued writers
440471
)
441472
```
442473

@@ -464,35 +495,36 @@ collection, err := db.CreateCollection(ctx, "vectors",
464495
libravdb.WithMetadataSchema(schema),
465496
libravdb.WithIndexedFields("category", "timestamp"),
466497

467-
// Batch processing
468-
libravdb.WithBatchChunkSize(5000),
469-
libravdb.WithBatchConcurrency(16),
470498
)
471499
```
472500

473501
### Environment-Specific Configurations
474502

475503
**Development**:
476504
```go
477-
libravdb.WithStoragePath("./dev_data")
505+
libravdb.WithStoragePath("./dev_data.libravdb")
478506
libravdb.WithMetrics(false)
479507
libravdb.WithMemoryLimit(1*1024*1024*1024) // 1GB
480508
```
481509

482510
**Production**:
483511
```go
484-
libravdb.WithStoragePath("/var/lib/libravdb")
512+
libravdb.WithStoragePath("/var/lib/libravdb/data.libravdb")
485513
libravdb.WithMetrics(true)
486514
libravdb.WithTracing(true)
515+
libravdb.WithMaxConcurrentWrites(2)
516+
libravdb.WithMaxWriteQueueDepth(64)
487517
libravdb.WithMemoryLimit(32*1024*1024*1024) // 32GB
488518
```
489519

490520
**High-Scale**:
491521
```go
522+
libravdb.WithStoragePath("/var/lib/libravdb/data.libravdb")
492523
libravdb.WithAutoIndexSelection(true)
493524
libravdb.WithMemoryMapping(true)
494525
libravdb.WithProductQuantization(16, 8, 0.05)
495-
libravdb.WithBatchConcurrency(32)
526+
libravdb.WithMaxConcurrentWrites(2)
527+
libravdb.WithMaxWriteQueueDepth(128)
496528
```
497529

498530
Complete configuration guide: [docs/configuration/configuration.md](docs/configuration/configuration.md)
@@ -553,6 +585,31 @@ results, err := collection.Query(ctx).
553585
Execute()
554586
```
555587

588+
### Lifecycle And Export
589+
590+
```go
591+
collections := db.ListCollections()
592+
593+
records, err := collection.Query(ctx).
594+
Eq("sessionId", "s1").
595+
Limit(100).
596+
List()
597+
if err != nil {
598+
log.Fatal(err)
599+
}
600+
601+
count, err := collection.Count(ctx)
602+
if err != nil {
603+
log.Fatal(err)
604+
}
605+
606+
fmt.Printf("collection has %d records\n", count)
607+
608+
if err := db.DeleteCollection(ctx, "session:old"); err != nil {
609+
log.Fatal(err)
610+
}
611+
```
612+
556613
### Performance Monitoring
557614

558615
```go
@@ -842,7 +899,7 @@ LibraVDB builds upon decades of research and development in vector databases and
842899

843900
### Research Foundations
844901
- **HNSW Algorithm**: Based on research by Yu. A. Malkov and D. A. Yashunin
845-
- **LSM-Tree Architecture**: Inspired by Google's Bigtable and LevelDB
902+
- **Single-File Storage Design**: Informed by WAL, snapshot, and durable embedded database design patterns
846903
- **Product Quantization**: Based on work by Hervé Jégou, Matthijs Douze, and Cordelia Schmid
847904
- **Vector Database Concepts**: Building on research from Facebook AI, Google Research, and academic institutions
848905

@@ -866,4 +923,4 @@ LibraVDB builds upon decades of research and development in vector databases and
866923

867924
Made with ❤️ by the LibraVDB community
868925

869-
</div>
926+
</div>

0 commit comments

Comments
 (0)