-
-
Notifications
You must be signed in to change notification settings - Fork 0
Architecture Namespace Sharding
Problem: Spector's per-user file isolation stores each namespace as a separate directory. At >50K namespaces, flat directory listing becomes an O(N) bottleneck on all filesystems. This design adds hash-based directory sharding and a lazy migration pipeline to scale namespace discovery.
graph LR
subgraph "Flat Layout (≤50K)"
F1["namespaces/agent-alpha/"]
F2["namespaces/agent-beta/"]
F3["namespaces/agent-gamma/"]
end
subgraph "Sharded Layout (>50K)"
S1["namespaces/a3/f7/agent-alpha/"]
S2["namespaces/b2/1c/agent-beta/"]
S3["namespaces/d1/e8/agent-gamma/"]
end
F1 -.->|"FlatToShardedMigrator"| S1
Key decisions:
- SHA-256 hash of the namespace ID provides uniform distribution
- 2-level directory hierarchy (256 × 256 = 65,536 buckets)
- Each bucket holds ~15 entries even at 1M namespaces
- Migration is in-place and non-disruptive (flat directories remain accessible during transition)
The StorageLayout class provides two path resolvers:
| Method | Layout | Example |
|---|---|---|
namespaceDir(base, id) |
Flat | namespaces/agent-alpha/ |
namespaceDirSharded(base, id) |
Sharded | namespaces/a3/f7/agent-alpha/ |
tenantNamespaceDirSharded(base, tenant, ns) |
Tenant-scoped | namespaces/xx/yy/acme-corp/agent-alpha/ |
The shard path is derived from the first 4 hex characters of SHA-256(namespaceId):
SHA-256("agent-alpha") → "a3f7e2b1..."
↓ ↓
L1 L2
Path: namespaces/a3/f7/agent-alpha/
- L1 bucket: Characters 0-1 (256 possibilities)
- L2 bucket: Characters 2-3 (256 possibilities)
- Total buckets: 65,536
For tenant-scoped namespaces, the hash is derived from the tenant ID (not the namespace ID), ensuring all namespaces for a tenant are colocated:
SHA-256("acme-corp") → "xx yy ..."
Path: namespaces/xx/yy/acme-corp/agent-alpha/
Path: namespaces/xx/yy/acme-corp/user-alice/
| Namespaces | Flat Dir Entries | Sharded Max per Bucket |
|---|---|---|
| 1,000 | 1,000 | ~1 |
| 50,000 | 50,000 | ~1 |
| 500,000 | 500,000 | ~8 |
| 1,000,000 | 1,000,000 | ~15 |
The SpectorNamespaceManager discovers namespaces at startup using one of two strategies:
Scan: namespaces/*/namespace.json
Complexity: O(N) — single directory listing
Scan: namespaces/XX/YY/*/namespace.json
Complexity: O(N) — but each directory listing is O(1) since buckets are small
flowchart TD
START["SpectorNamespaceManager(basePath, sharded=true)"]
START --> SCAN_L1["Scan namespaces/ for 2-char hex dirs"]
SCAN_L1 --> SCAN_L2["For each L1: scan for 2-char hex subdirs"]
SCAN_L2 --> SCAN_NS["For each L2: scan for namespace dirs"]
SCAN_NS --> CHECK["Check: namespace.json exists?"]
CHECK -->|"Yes"| REGISTER["Register in ConcurrentHashMap"]
CHECK -->|"No"| SKIP["Skip directory"]
The MigrationPipeline runs schema migrations lazily — migrations execute on first access to a namespace, not at startup. This avoids holding up server startup when there are thousands of namespaces.
sequenceDiagram
participant Client
participant NSMgr as SpectorNamespaceManager
participant Pipeline as MigrationPipeline
participant Disk as Filesystem
Client->>NSMgr: getNamespace("agent-alpha")
NSMgr->>NSMgr: Lookup in ConcurrentHashMap
NSMgr->>Pipeline: migrateIfNeeded(nsDir)
Pipeline->>Disk: Read manifest.json
alt Manifest version < current
Pipeline->>Disk: Run pending migrations
Pipeline->>Disk: Update manifest.json
else Up to date
Pipeline->>NSMgr: No-op
end
NSMgr->>Client: NamespaceContext
Migrates existing flat namespace directories to the sharded layout in-place:
-
Scan flat
namespaces/directory for namespace dirs - Compute SHA-256 shard path for each namespace
-
Create target shard directories (
namespaces/XX/YY/) -
Move namespace directory atomically via
Files.move() - Verify all namespaces are accessible at new paths
warning: Migration Safety The migrator is designed to be resumable — if interrupted mid-migration, re-running it will pick up where it left off. Namespaces already at their shard path are skipped.
Each namespace has a namespace.json with:
{
"id": "agent-alpha",
"display_name": "Agent Alpha",
"max_memories": -1,
"max_partitions": -1,
"max_storage_bytes": -1,
"read_only": false,
"created_at": "2026-06-17T00:00:00Z"
}Groups configuration, quota enforcement, and resolved paths:
| Field | Type | Description |
|---|---|---|
config |
NamespaceConfig |
ID, display name, quotas |
quotas |
NamespaceQuotas |
Runtime quota tracker |
directory |
Path |
Resolved path (flat or sharded) |
Each namespace directory contains the standard memory layout:
agent-alpha/
├── namespace.json ← Configuration & quotas
├── global/
│ ├── working.mem ← Working memory (TTL-based)
│ ├── checkpoint.meta ← WAL high-water mark
│ ├── coactivation.tracker
│ └── wal/
│ ├── wal-000000.bin ← WAL chunk files
│ └── wal-000001.bin
├── partitions/
│ └── 000_1781460634/ ← Colocated partition
│ ├── semantic.mem ← Semantic tier (mmap'd)
│ ├── episodic.mem ← Episodic tier
│ ├── procedural.mem ← Procedural tier
│ ├── text.dat ← Raw text (V2 mmap format)
│ ├── index.midx ← Memory index
│ ├── hebbian.graph ← Co-activation edges
│ ├── temporal.chain ← Temporal links
│ └── entity.graph ← Entity knowledge graph
└── cross/
├── hebbian-cross.graph ← Cross-partition edges
└── entity-cross.graph ← Cross-partition entities
| Property | Default | Description |
|---|---|---|
spector.namespaces.sharded |
false |
Enable hash-based directory sharding |
spector.namespaces.migration.enabled |
true |
Enable lazy schema migration on access |
spector.namespaces.migration.flat-to-sharded |
false |
Auto-migrate flat layouts to sharded |
- Persistence & Replication — WAL and CRDT merge
- Off-Heap Panama Design — mmap'd storage
- Distributed Mode — search-layer sharding (different from namespace sharding)
- Home
- Getting Started
-
Cognitive Memory
- Overview
- Getting Started
- Use Cases & Configuration
- API Reference
- Architecture
- The 6-Phase Scoring Pipeline
- Cognitive Profiles
-
Biological Systems
- Overview
- Cortex — Tier Stores
- Hippocampus — Sleep Consolidation
- Synapse — Tags & Scoring
- Dopamine — Surprise Detection
- Amygdala — Emotional Valence
- 3-Layer Cognitive Graph
- Habituation — Anti-Filter Bubble
- Inhibition — Suppression
- Interference — Deduplication
- Prospective — Future Intents
- Metamemory — Self-Reflection
- Sync — Persistence & Replication
- Performance & Internals
- Cognitive Evaluation
-
Architecture
- System Overview
- Core Concepts
- Ingestion Pipeline
- RAG Pipeline
- MCP Integration
- Distributed Mode
- Namespace Sharding
- GPU Acceleration
- Performance Tuning
- Test Framework & LLM Judge
- JDK API Status
- Security & Data
- Deep Dives
- Cortex Dashboard
-
Community
- Contributing
- FAQ
- Roadmap
- 🔬 Labs