-
-
Notifications
You must be signed in to change notification settings - Fork 16
Home
RAM Coffers is an industry-first approach to CPU-based LLM inference that indexes model weights by NUMA memory bank, enabling selective prefetch and non-bijunctive pruning before data ever reaches the compute pipeline.
This work was first published on December 16, 2025, predating DeepSeek's Engram paper (arXiv:2601.07372, January 12, 2026) by 27 days. RAM Coffers introduces 15 features that DeepSeek Engram does not implement, including NUMA topology routing, cognitive hemisphere mapping, tetranary logic, and vec_perm single-cycle collapse.
Standard LLM inference treats all RAM equally. RAM Coffers partitions model weights across NUMA nodes with domain-specific routing, so that a query about language activates different physical memory banks than a query about spatial reasoning.
The system runs on an IBM POWER8 S824 with 512 GB RAM across 4 NUMA nodes, achieving 147 tokens/second on TinyLlama 1.1B -- a 9x improvement over stock llama.cpp on the same hardware.
- O(1) coffer routing via cosine similarity on query embeddings
-
NUMA-pinned execution using
numactlfor memory locality - DCBT resident prefetch keeping hot weights in L2/L3 cache
- Vec_perm non-bijunctive collapse pruning weak attention paths in a single POWER8 cycle
- PSE burst entropy from hardware timebase for behavioral divergence
- Neuromorphic cognitive routing mapping Brodmann brain areas to NUMA topology
- Repository: github.com/Scottcjn/ram-coffers
- RustChain Project: rustchain.org
- Elyan Labs: technicianrental.com
- BoTTube AI Platform: bottube.ai
- Architecture - NUMA coffer layout, cognitive routing, and vec_perm collapse details