Explore Bank-Level Physical Isolation for Large Memory Banks #2123
Sanderhoff-alt
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
Hindsight currently has a clear logical isolation model: tenant isolation is handled at the schema level, and multiple banks inside the same tenant share the same tables while being separated by
bank_id. This works well for many small or medium banks, and the current implementation already includes important optimizations such asbank_idfilters, bank/fact-type indexes, and per-bank vector indexes on PostgreSQL for non-ScaNN vector backends.However, for single-tenant deployments or workloads where one bank grows very large, the physical storage layout may become a bottleneck. It may be worth exploring a roadmap for bank-level physical isolation or "large bank promotion", where large banks can receive dedicated physical storage/indexing layouts while small banks continue to share tables.
Current Behavior
From the current architecture:
banks,documents,chunks,memory_units,mental_models,directives, and related link tables.bank_id.bank_id.bank_id-aware indexes and per-bank/per-fact-type vector index support for non-ScaNN vector backends.memory_units.bank_id, allowing partition pruning for bank-scoped queries.This design is simple and effective for shared multi-bank deployments, but it leaves the largest bank in a tenant competing inside the same physical table family as every other bank.
Problem
For very large banks, logical
bank_idfiltering may not be enough. The main concerns are:memory_unitstables can make vector, text, temporal, tag, and maintenance indexes harder to keep efficient.DELETEor index maintenance operations instead of dropping or rebuilding isolated physical structures.This is not necessarily a bug in the current design. It is more of a scalability and operations question: should the database layout reflect Hindsight's logical hierarchy more strongly for large-bank workloads?
Proposed Direction
Consider adding a bank-level physical isolation strategy with a tiered layout:
The key design goal is:
Possible Implementations
Option A: PostgreSQL list partitions by
bank_idUse partitioning so each large bank has its own partition, while smaller banks may remain in a default/shared partition.
Pros:
bank_id.memory_unitsbank partitioning approach.Cons:
documents,chunks,memory_links,unit_entities,entity_cooccurrences, andasync_operationsneed a consistent strategy.Option B: Default shared partition plus dedicated large-bank partitions
Keep all small banks in a default shared partition. Promote only large banks to dedicated partitions.
Pros:
Cons:
storage_layout = shared | dedicated_partition.Option C: Dedicated table/schema/database for very large banks
For extremely large banks, move the bank into a dedicated table set, schema, or even database.
Pros:
Cons:
Option D: Logical tiering without table partitioning
Keep the table layout but add hot/cold memory tiers:
Default recall would search active data first, while archived data would be searched only with a higher budget or explicit option.
Pros:
Cons:
Suggested First Step
A balanced first step could be:
memory_units, since it is the core growth table and the primary retrieval/consolidation target.bank_idindexes.Open Questions
memory_unitsbe partitioned first, or shoulddocuments,chunks, links, and entity tables be included from the beginning?Expected Benefits
Non-Goals
Beta Was this translation helpful? Give feedback.
All reactions