Author: Scott Boudreaux
Date: December 16, 2025
Institution: Elyan Labs (Independent Research)
Hardware: IBM POWER8 S824 (320GB RAM, Dual 8-core)
| Paper | DOI | Date |
|---|---|---|
| RAM Coffers: NUMA-Distributed Weight Banking | 10.5281/zenodo.18321905 | Jan 2026 |
| Non-Bijunctive Permutation Collapse (vec_perm for LLM attention) | 10.5281/zenodo.18623920 | Feb 2026 |
| PSE Hardware Entropy for Behavioral Divergence (mftb injection) | 10.5281/zenodo.18623922 | Feb 2026 |
| Neuromorphic Prompt Translation (GRAIL-V, emotional prompting) | 10.5281/zenodo.18623594 | Feb 2026 |
| RustChain: One CPU, One Vote (Proof of Antiquity consensus) | 10.5281/zenodo.18623592 | Feb 2026 |
| Memory Scaffolding Shapes LLM Inference (persistent context effects) | 10.5281/zenodo.18817988 | Feb 2026 |
This work introduces RAM Coffers, a NUMA-aware conditional memory architecture for efficient Large Language Model (LLM) inference. The system selectively houses model knowledge across distributed RAM banks with resonance-based routing, enabling O(1) knowledge retrieval without GPU dependency.
Key innovations include:
-
NUMA-Distributed Weight Banking: Model weights partitioned across NUMA nodes by domain (e.g., core knowledge, science/tech, creative, history)
-
Resonance Routing: Query embeddings matched to coffer domain signatures via cosine similarity for intelligent weight activation
-
Non-Bijunctive Pruning: Selective path collapse before full weight fetch, reducing memory bandwidth requirements
-
DCBT Resident Prefetch: PowerPC data cache block touch hints for L2/L3 residency, achieving 147+ tokens/second on POWER8
| Coffer | NUMA Node | Capacity | Role |
|--------|-----------|----------|---------------------|
| 0 | 3 | 193 GB | Heavy/General (core)|
| 1 | 1 | 183 GB | Science/Tech domain |
| 2 | 0 | 119 GB | Creative/Long CTX |
| 3 | 2 | 62 GB | Niche/History |
- Query embed → route_to_coffer: Resonance matching selects appropriate memory bank
- activate_coffer → DCBT prefetch + numa_run_on_node: Thread affinity and cache warming
- pse_collapse_prune: Non-bijunctive path selection before full fetch
- Generate with PSE entropy: Hardware entropy injection from active coffer node
This architecture predates and conceptually parallels DeepSeek's "Engram" paper (arXiv:2601.07372, January 12, 2026) by 27 days. Both approaches address the same fundamental insight: separating static knowledge storage from dynamic computation enables more efficient LLM inference.
Key parallels:
- RAM Coffers (Dec 16, 2025): "Selectively house model information in known RAM banks with resonance routing for associative recall"
- DeepSeek Engram (Jan 12, 2026): "Separate static knowledge from dynamic compute via O(1) lookup"
Testing on this architecture led to a significant discovery: emotional language enables 20% efficiency gains in video generation, mirroring limbic gating in biological memory.
See /grail-v-paper for the full CVPR 2026 submission:
- 35 matched-pair benchmark with LPIPS validation
- 23.9% file size reduction in controlled ablation
- Cross-model validation on AnimateDiff and SVD
- Theoretical grounding via Hopfield/EBM frameworks
Key Finding: Complex multi-character emotional scenes benefit ~33% efficiency regardless of architecture.
The elyan-prime MCP server that powers the persistent memory system used during development of RAM Coffers is itself the subject of research. The paper "Memory Scaffolding Shapes LLM Inference" (DOI 10.5281/zenodo.18817988) demonstrates that persistent context (600+ memories) fundamentally changes how an LLM architects solutions — the iterative compounding that produced RAM Coffers is a direct example of this effect.
- Repository: Scottcjn/elyan-prime
- Article: Dev.to — Memory Scaffolding Shapes LLM Inference
If this repository is new to you, start in this order:
ggml-ram-coffers.h— high-level routing and coffer selection modelggml-coffer-mmap.h— memory mapping and NUMA shard placementggml-topk-collapse-vsx.h— vectorized collapse path detailspower8-compat.h— ISA compatibility layer and portability constraints
Suggested first goal: trace one inference request from coffer selection to collapse execution, then compare against the performance table.
| File | Description |
|---|---|
ggml-ram-coffers.h |
Multi-bank NUMA weight indexing with resonance routing |
ggml-coffer-mmap.h |
GGUF model sharding across NUMA nodes |
ggml-ram-coffer.h |
Single coffer implementation |
ggml-intelligent-collapse.h |
Hebbian-inspired non-bijunctive path collapse |
ggml-topk-collapse-vsx.h |
VSX-optimized Top-K attention collapse |
pse-entropy-burst.h |
Hardware entropy injection via PowerPC timebase |
power8-compat.h |
POWER9→POWER8 intrinsic compatibility layer |
On IBM POWER8 S824 with TinyLlama 1.1B Q4_K:
| Configuration | Tokens/sec (pp128) |
|---|---|
| Stock llama.cpp | 16.74 |
| + POWER8 VSX | 66.49 |
| + PSE Collapse | 84.62 |
| + RAM Coffers + DCBT | 147.54 |
8.81x speedup over stock on "obsolete" hardware.
If you want to compare changes quickly, use this lightweight baseline procedure.
lscpu
numactl --hardwareUse one fixed prompt and one fixed model build so runs are comparable.
# Example shape only; adjust binary/model path to your local setup
./main -m ./models/tinyllama-1.1b-q4_k.gguf -p "Explain NUMA routing in one paragraph" -n 128 -ngl 0Record at minimum:
- tokens/sec
- prompt + generation lengths
- active NUMA node affinity policy
- whether collapse/prefetch code paths were enabled
When opening a PR, include:
- what changed
- one baseline result
- one post-change result
- exact command used
This keeps performance claims falsifiable and makes review much faster.
MIT License - Free to use, modify, and distribute with attribution.
@software{boudreaux2025ramcoffers,
author = {Boudreaux, Scott},
title = {RAM Coffers: NUMA-Distributed Conditional Memory for LLM Inference},
year = {2025},
month = {12},
day = {16},
publisher = {Zenodo},
doi = {10.5281/zenodo.18321905},
url = {https://doi.org/10.5281/zenodo.18321905},
note = {Independent research predating DeepSeek Engram (arXiv:2601.07372) by 27 days}
}
@article{boudreaux2026vecperm,
author = {Boudreaux, Scott},
title = {Non-Bijunctive Permutation Collapse: AltiVec vec\_perm Enables Single-Cycle Attention Path Selection},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.18623920},
url = {https://doi.org/10.5281/zenodo.18623920}
}
@article{boudreaux2026pse,
author = {Boudreaux, Scott},
title = {Hardware Entropy Injection for Behavioral Divergence in LLM Inference: The PSE Framework on IBM POWER8},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.18623922},
url = {https://doi.org/10.5281/zenodo.18623922}
}
@article{boudreaux2026memoryscaffolding,
author = {Boudreaux, Scott},
title = {Memory Scaffolding Shapes LLM Inference: How Persistent Context Changes What AI Builds},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.18817988},
url = {https://doi.org/10.5281/zenodo.18817988}
}- GitHub: Scottcjn
- X/Twitter: @RustchainPOA
This repository is header-focused; there is no single build script yet. A fast way to explore:
- Start from
ggml-ram-coffers.hfor the multi-bank routing path. - Follow
ggml-coffer-mmap.hfor sharding/memory-mapping details. - Read
power8-compat.h+ggml-topk-collapse-vsx.hfor ISA-specific optimizations.
- Grokipedia: Elyan Labs Reference
- Grokipedia: RAM Coffers Search
- I Run LLMs on a 768GB IBM POWER8 Server - Dev.to article covering RAM Coffers
- Proof of Antiquity: A Blockchain That Rewards Vintage Hardware - Dev.to
- Memory Scaffolding Shapes LLM Inference - Dev.to article on persistent memory effects
┌─────────────────────────────────────────────────────────────┐
│ RAM Coffers System │
└─────────────────────────────────────────────────────────────┘
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Frontend │─────▶│ Backend │─────▶│ Database │
│ (Web UI) │ │ (API) │ │ (PostgreSQL) │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
│ │ │
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Browser │ │ Server │ │ Storage │
│ Cache │ │ Cache │ │ Layer │
└──────────────┘ └──────────────┘ └──────────────┘
## Components
### Frontend
- React/Vue.js UI
- Real-time updates
- Responsive design
### Backend
- RESTful API
- Authentication
- Business logic
### Database
- PostgreSQL
- Data persistence
- Query optimization
### Caching
- Redis for session
- Browser cache
- CDN integration
## Additional Resources
- **Full Paper**: [RAM Coffers on Zenodo](https://doi.org/10.5281/zenodo.18321905)
- **BCOS Certification**: See [BCOS.md](BCOS.md) for certification details
- **Contributing**: Read [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines
---
*RAM Coffers predates DeepSeek's Engram architecture by 27 days, establishing priority for NUMA-aware weight banking in LLM inference.*
---
<div align="center">
**[Elyan Labs](https://github.com/Scottcjn)** · 1,882 commits · 97 repos · 1,334 stars · $0 raised
[⭐ Star Rustchain](https://github.com/Scottcjn/Rustchain) · [📊 Q1 2026 Traction Report](https://github.com/Scottcjn/Rustchain/blob/main/docs/DEVELOPER_TRACTION_Q1_2026.md) · [Follow @Scottcjn](https://github.com/Scottcjn)
</div>