Skip to content

Scottcjn/ram-coffers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

RAM Coffers: NUMA-Distributed Conditional Memory for LLM Inference

BCOS Certified Author: Scott Boudreaux Date: December 16, 2025 Institution: Elyan Labs (Independent Research) Hardware: IBM POWER8 S824 (320GB RAM, Dual 8-core)

DOI

Publications

Paper DOI Date
RAM Coffers: NUMA-Distributed Weight Banking 10.5281/zenodo.18321905 Jan 2026
Non-Bijunctive Permutation Collapse (vec_perm for LLM attention) 10.5281/zenodo.18623920 Feb 2026
PSE Hardware Entropy for Behavioral Divergence (mftb injection) 10.5281/zenodo.18623922 Feb 2026
Neuromorphic Prompt Translation (GRAIL-V, emotional prompting) 10.5281/zenodo.18623594 Feb 2026
RustChain: One CPU, One Vote (Proof of Antiquity consensus) 10.5281/zenodo.18623592 Feb 2026
Memory Scaffolding Shapes LLM Inference (persistent context effects) 10.5281/zenodo.18817988 Feb 2026

Abstract

This work introduces RAM Coffers, a NUMA-aware conditional memory architecture for efficient Large Language Model (LLM) inference. The system selectively houses model knowledge across distributed RAM banks with resonance-based routing, enabling O(1) knowledge retrieval without GPU dependency.

Key innovations include:

  1. NUMA-Distributed Weight Banking: Model weights partitioned across NUMA nodes by domain (e.g., core knowledge, science/tech, creative, history)

  2. Resonance Routing: Query embeddings matched to coffer domain signatures via cosine similarity for intelligent weight activation

  3. Non-Bijunctive Pruning: Selective path collapse before full weight fetch, reducing memory bandwidth requirements

  4. DCBT Resident Prefetch: PowerPC data cache block touch hints for L2/L3 residency, achieving 147+ tokens/second on POWER8

Architecture

| Coffer | NUMA Node | Capacity | Role                |
|--------|-----------|----------|---------------------|
| 0      | 3         | 193 GB   | Heavy/General (core)|
| 1      | 1         | 183 GB   | Science/Tech domain |
| 2      | 0         | 119 GB   | Creative/Long CTX   |
| 3      | 2         | 62 GB    | Niche/History       |

Processing Flow

  1. Query embed → route_to_coffer: Resonance matching selects appropriate memory bank
  2. activate_coffer → DCBT prefetch + numa_run_on_node: Thread affinity and cache warming
  3. pse_collapse_prune: Non-bijunctive path selection before full fetch
  4. Generate with PSE entropy: Hardware entropy injection from active coffer node

Relation to Subsequent Work

This architecture predates and conceptually parallels DeepSeek's "Engram" paper (arXiv:2601.07372, January 12, 2026) by 27 days. Both approaches address the same fundamental insight: separating static knowledge storage from dynamic computation enables more efficient LLM inference.

Key parallels:

  • RAM Coffers (Dec 16, 2025): "Selectively house model information in known RAM banks with resonance routing for associative recall"
  • DeepSeek Engram (Jan 12, 2026): "Separate static knowledge from dynamic compute via O(1) lookup"

GRAIL-V Paper: Emotional Prompting Discovery

Testing on this architecture led to a significant discovery: emotional language enables 20% efficiency gains in video generation, mirroring limbic gating in biological memory.

See /grail-v-paper for the full CVPR 2026 submission:

  • 35 matched-pair benchmark with LPIPS validation
  • 23.9% file size reduction in controlled ablation
  • Cross-model validation on AnimateDiff and SVD
  • Theoretical grounding via Hopfield/EBM frameworks

Key Finding: Complex multi-character emotional scenes benefit ~33% efficiency regardless of architecture.

Memory Scaffolding

The elyan-prime MCP server that powers the persistent memory system used during development of RAM Coffers is itself the subject of research. The paper "Memory Scaffolding Shapes LLM Inference" (DOI 10.5281/zenodo.18817988) demonstrates that persistent context (600+ memories) fundamentally changes how an LLM architects solutions — the iterative compounding that produced RAM Coffers is a direct example of this effect.


New Reader Path (5-minute orientation)

If this repository is new to you, start in this order:

  1. ggml-ram-coffers.h — high-level routing and coffer selection model
  2. ggml-coffer-mmap.h — memory mapping and NUMA shard placement
  3. ggml-topk-collapse-vsx.h — vectorized collapse path details
  4. power8-compat.h — ISA compatibility layer and portability constraints

Suggested first goal: trace one inference request from coffer selection to collapse execution, then compare against the performance table.

Files Included

File Description
ggml-ram-coffers.h Multi-bank NUMA weight indexing with resonance routing
ggml-coffer-mmap.h GGUF model sharding across NUMA nodes
ggml-ram-coffer.h Single coffer implementation
ggml-intelligent-collapse.h Hebbian-inspired non-bijunctive path collapse
ggml-topk-collapse-vsx.h VSX-optimized Top-K attention collapse
pse-entropy-burst.h Hardware entropy injection via PowerPC timebase
power8-compat.h POWER9→POWER8 intrinsic compatibility layer

Performance Results

On IBM POWER8 S824 with TinyLlama 1.1B Q4_K:

Configuration Tokens/sec (pp128)
Stock llama.cpp 16.74
+ POWER8 VSX 66.49
+ PSE Collapse 84.62
+ RAM Coffers + DCBT 147.54

8.81x speedup over stock on "obsolete" hardware.

Benchmark Harness (Contributor Starter)

If you want to compare changes quickly, use this lightweight baseline procedure.

1) Capture machine topology

lscpu
numactl --hardware

2) Record a repeatable inference baseline

Use one fixed prompt and one fixed model build so runs are comparable.

# Example shape only; adjust binary/model path to your local setup
./main -m ./models/tinyllama-1.1b-q4_k.gguf -p "Explain NUMA routing in one paragraph" -n 128 -ngl 0

Record at minimum:

  • tokens/sec
  • prompt + generation lengths
  • active NUMA node affinity policy
  • whether collapse/prefetch code paths were enabled

3) Compare before/after changes

When opening a PR, include:

  • what changed
  • one baseline result
  • one post-change result
  • exact command used

This keeps performance claims falsifiable and makes review much faster.

License

MIT License - Free to use, modify, and distribute with attribution.

Citation

@software{boudreaux2025ramcoffers,
  author = {Boudreaux, Scott},
  title = {RAM Coffers: NUMA-Distributed Conditional Memory for LLM Inference},
  year = {2025},
  month = {12},
  day = {16},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.18321905},
  url = {https://doi.org/10.5281/zenodo.18321905},
  note = {Independent research predating DeepSeek Engram (arXiv:2601.07372) by 27 days}
}

@article{boudreaux2026vecperm,
  author = {Boudreaux, Scott},
  title = {Non-Bijunctive Permutation Collapse: AltiVec vec\_perm Enables Single-Cycle Attention Path Selection},
  year = {2026},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.18623920},
  url = {https://doi.org/10.5281/zenodo.18623920}
}

@article{boudreaux2026pse,
  author = {Boudreaux, Scott},
  title = {Hardware Entropy Injection for Behavioral Divergence in LLM Inference: The PSE Framework on IBM POWER8},
  year = {2026},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.18623922},
  url = {https://doi.org/10.5281/zenodo.18623922}
}

@article{boudreaux2026memoryscaffolding,
  author = {Boudreaux, Scott},
  title = {Memory Scaffolding Shapes LLM Inference: How Persistent Context Changes What AI Builds},
  year = {2026},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.18817988},
  url = {https://doi.org/10.5281/zenodo.18817988}
}

Contact

  • GitHub: Scottcjn
  • X/Twitter: @RustchainPOA

Quick Start (Code Reading)

This repository is header-focused; there is no single build script yet. A fast way to explore:

  1. Start from ggml-ram-coffers.h for the multi-bank routing path.
  2. Follow ggml-coffer-mmap.h for sharding/memory-mapping details.
  3. Read power8-compat.h + ggml-topk-collapse-vsx.h for ISA-specific optimizations.

Press and References

RAM Coffers Architecture

┌─────────────────────────────────────────────────────────────┐
│                     RAM Coffers System                       │
└─────────────────────────────────────────────────────────────┘

┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│   Frontend   │─────▶│   Backend    │─────▶│   Database   │
│  (Web UI)    │      │   (API)      │      │ (PostgreSQL) │
└──────────────┘      └──────────────┘      └──────────────┘
       │                     │                      │
       │                     │                      │
       ▼                     ▼                      ▼
┌──────────────┐      ┌──────────────┐      ┌──────────────┐
│   Browser    │      │   Server     │      │   Storage    │
│   Cache      │      │   Cache      │      │   Layer      │
└──────────────┘      └──────────────┘      └──────────────┘

## Components

### Frontend
- React/Vue.js UI
- Real-time updates
- Responsive design

### Backend
- RESTful API
- Authentication
- Business logic

### Database
- PostgreSQL
- Data persistence
- Query optimization

### Caching
- Redis for session
- Browser cache
- CDN integration

## Additional Resources

- **Full Paper**: [RAM Coffers on Zenodo](https://doi.org/10.5281/zenodo.18321905)
- **BCOS Certification**: See [BCOS.md](BCOS.md) for certification details
- **Contributing**: Read [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines

---

*RAM Coffers predates DeepSeek's Engram architecture by 27 days, establishing priority for NUMA-aware weight banking in LLM inference.*

---

<div align="center">

**[Elyan Labs](https://github.com/Scottcjn)** · 1,882 commits · 97 repos · 1,334 stars · $0 raised

[⭐ Star Rustchain](https://github.com/Scottcjn/Rustchain) · [📊 Q1 2026 Traction Report](https://github.com/Scottcjn/Rustchain/blob/main/docs/DEVELOPER_TRACTION_Q1_2026.md) · [Follow @Scottcjn](https://github.com/Scottcjn)

</div>

About

RAM Coffers: Conditional Memory via NUMA-Distributed Weight Banking - O(1) lookup routing for LLM inference (Dec 16, 2025 - predates DeepSeek Engram by 27 days)

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

 
 
 

Contributors