Home

RAM Coffers - NUMA-Aware Weight Banking for LLM Inference

RAM Coffers is an industry-first approach to CPU-based LLM inference that indexes model weights by NUMA memory bank, enabling selective prefetch and non-bijunctive pruning before data ever reaches the compute pipeline.

Priority Claim

This work was first published on December 16, 2025, predating DeepSeek's Engram paper (arXiv:2601.07372, January 12, 2026) by 27 days. RAM Coffers introduces 15 features that DeepSeek Engram does not implement, including NUMA topology routing, cognitive hemisphere mapping, tetranary logic, and vec_perm single-cycle collapse.

Core Innovation

Standard LLM inference treats all RAM equally. RAM Coffers partitions model weights across NUMA nodes with domain-specific routing, so that a query about language activates different physical memory banks than a query about spatial reasoning.

The system runs on an IBM POWER8 S824 with 512 GB RAM across 4 NUMA nodes, achieving 147 tokens/second on TinyLlama 1.1B -- a 9x improvement over stock llama.cpp on the same hardware.

Key Capabilities

O(1) coffer routing via cosine similarity on query embeddings
NUMA-pinned execution using numactl for memory locality
DCBT resident prefetch keeping hot weights in L2/L3 cache
Vec_perm non-bijunctive collapse pruning weak attention paths in a single POWER8 cycle
PSE burst entropy from hardware timebase for behavioral divergence
Neuromorphic cognitive routing mapping Brodmann brain areas to NUMA topology

Links

Repository: github.com/Scottcjn/ram-coffers
RustChain Project: rustchain.org
Elyan Labs: technicianrental.com
BoTTube AI Platform: bottube.ai

Wiki Contents

Architecture - NUMA coffer layout, cognitive routing, and vec_perm collapse details

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!