Skip to content

SBALAVIGNESH123/hypercore-rs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

17 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Hypercore

A local AI that learns from your documents, remembers your decisions, and helps you discover patterns in your own thinking.

CPU-first LLM inference runtime + personal intelligence system, written in Rust.

Rust License OpenAI Compatible Release

Quickstart Β· Personal Intelligence Β· Inference Engine Β· Architecture Β· Contributing


The Problem

Every local AI tool today does the same thing: store your documents, let you ask questions about them.

That's retrieval. It's useful. But it's not intelligence.

Hypercore goes further. It reads your journals, meeting notes, architecture decisions, and project logs β€” then tells you things about yourself you hadn't noticed.

"Across 8 projects, you repeatedly chose simpler local deployments over more scalable architectures. You consistently optimize for independence rather than maximum scale."

That's not search. That's synthesis. That's the difference.


πŸš€ Quickstart

Install from Source

# Prerequisites: Rust 1.80+, CMake, Clang
git clone https://github.com/SBALAVIGNESH123/hypercore-rs.git
cd hypercore-rs
cargo build --release

Your First Memory Graph

Three commands. That's all it takes.

# 1. Feed it your documents
hypercore ingest --path ./my-notes

# 2. See what it learned about you
hypercore memory show

# 3. Ask "Why am I like this?"
hypercore memory --model ./model.gguf explain

Hypercore ships with sample documents in examples/ so you can try it immediately:

hypercore ingest --path ./examples
hypercore memory show
hypercore memory timeline

Run as an API Server

Hypercore is also a full OpenAI-compatible inference server:

hypercore serve --model ./model.gguf --port 8080

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"hypercore","messages":[{"role":"user","content":"Hello"}]}'

Drop-in replacement for the OpenAI SDK. Your existing code just works.


🧠 Personal Intelligence

How It Works

Most AI tools stop at retrieval. Hypercore builds a personal memory graph from your documents and uses it to generate insights about your decision-making patterns.

Your Documents
    ↓
Ingestion (streaming, batched embeddings)
    ↓
SQLite Knowledge Store (FTS5 + vector search + dedup)
    ↓
Memory Extraction (natural language only β€” code files skipped)
    ↓
Memory Graph (Decisions Β· Preferences Β· Projects Β· Relationships)
    ↓
LLM Synthesis (compressed memories β†’ local model β†’ insight)
    ↓
Feedback ("Was this surprising?" β†’ persisted 1-4 rating)

The entire pipeline runs locally. Your documents never leave your machine.

Commands

Command What It Does
memory show Display your extracted memory graph by category
memory timeline Chronological view showing how your work evolved
memory recall <topic> Find evidence and context for past decisions
memory patterns Theme distribution, recurring words, source analysis
memory explain "Why am I like this?" β€” synthesize your decision-making DNA
memory insight Generate a weekly personal observation report
memory sync <path> Sync a directory into the memory bank

With LLM Synthesis

Add --model <path.gguf> to unlock AI-powered synthesis:

# Without model β†’ raw data + statistics
hypercore memory patterns

# With model β†’ compressed memories β†’ LLM β†’ synthesized insight
hypercore memory --model ./qwen-3b.gguf patterns

The system automatically compresses your memories to fit within the model's context window, prints token budget instrumentation, and streams the response in real time.

Feedback Loop

After every insight, Hypercore asks one question:

How valuable was this insight?
  1) Obvious β€” I already knew this
  2) Somewhat useful β€” mild interest
  3) Surprising β€” I hadn't noticed this
  4) Changed how I think β€” genuinely new perspective

Every rating is persisted to SQLite. This is how we measure whether the system produces real value β€” not retrieval accuracy, not benchmark scores, but genuine user reactions.

If nobody ever picks 3 or 4, we have work to do. If several users consistently pick 4, we've found something rare.


⚑ Inference Engine

Hypercore's inference runtime is production-grade, not a wrapper. It handles multi-session batching, memory pressure, and graceful degradation β€” the boring reliability that makes local AI actually usable.

Core Capabilities

Feature Description
Continuous Batching Round-robin chunked prefill with up to 4 concurrent sessions
Memory Pressure Rejection Explicit AdmissionRejected under pressure β€” never silent OOMs
Request Timeouts 300s deadline with auto-eviction of stuck sessions
Temperature Sampling Greedy (T=0) or temperature-scaled stochastic sampling
EOS Detection Auto-stop on end-of-generation tokens
LoRA Support Adapter path configurable (loading not yet implemented β€” we say so honestly)

TitanMem

Our adaptive KV-cache congestion controller, built on real control theory:

  • Dual-signal EMA β€” tracks both utilization and memory pressure
  • Hysteresis mode transitions β€” Calm β†’ Cautious β†’ Critical with deadband to prevent flapping
  • Dynamic threshold tuning β€” adapts to workload patterns over time
  • Per-session byte tracking β€” fine-grained memory accounting

We built TitanMem, benchmarked it rigorously, and discovered it didn't outperform the OS page cache. We published the data anyway because honest engineering matters more than marketing. Read the benchmarks β†’

API Server

Endpoint Method Description
/health GET Health check
/metrics GET Prometheus-format metrics
/v1/models GET List available models
/v1/chat/completions POST Chat completions (streaming + non-streaming)

OpenAI SDK compatible. Set HYPERCORE_API_KEY for bearer token authentication.


πŸ“Š Knowledge Store

The knowledge layer combines multiple retrieval strategies in a single SQLite database:

  • Hybrid search β€” FTS5 full-text search + cosine vector similarity, automatically routed
  • Content-hash deduplication β€” re-ingesting the same file is a no-op
  • Tree-sitter parsing β€” function-level chunking for C/C++ source files
  • Streaming ingestion β€” batch embedding (64 chunks per batch) with real-time progress
  • Smart filtering β€” memory extraction only processes natural language files; source code is skipped entirely

πŸ“ˆ Evaluation

We don't hardcode benchmark scores. Every evaluation runs real retrieval:

hypercore studio eval my_assistant.yaml
Eval Results: my_assistant.yaml
  Questions:          4
  Retrieval Hits:     1 / 4
  Retrieval Accuracy: 25.0%
  Avg Top Score:      0.2318
  βœ“ What are my most important projects? (score: 0.196, themes: ["HyperCore"])
  βœ— How do I typically write technical documents? (score: 0.240, themes: [])
  βœ— What decisions have I repeatedly made? (score: 0.172, themes: [])
  βœ— What technologies do I consistently prefer? (score: 0.318, themes: [])

25% accuracy. Not 92%. Because that's the real number.


πŸ— Architecture

Design Principles

1. Boring is what users trust. Every component is designed to be predictable under load. We choose explicit error handling over silent fallbacks, deterministic scheduling over probabilistic heuristics, and clear failure modes over optimistic retries.

2. No silent mutations. If Hypercore can't fulfill a request exactly as specified, it rejects it with a clear error. It will never silently truncate your prompt, quietly reduce max_tokens, or drop requests without telling you.

3. Measure before you claim. Every performance claim is backed by reproducible benchmarks. If a subsystem doesn't demonstrate an advantage under rigorous testing, we say so. See: TitanMem.

4. Insights over retrieval. The goal isn't "chat with your PDFs." It's "discover patterns in your thinking." That requires synthesis, not search.

Benchmarks

Measured on AMD Ryzen 9 7900X, DDR5, with a 0.5B Q5_K_M GGUF model:

Metric Value
Binary Size 15.8 MB
Idle RAM ~45 MB
Cold Start < 2.5s
Time to First Token 55–120ms
Throughput (1 session) ~45 tok/s
Throughput (4 sessions) ~110 tok/s

βš™οΈ Configuration

# hypercore.yaml
host: "0.0.0.0"
port: 8080
model_path: "model.gguf"
context_size: 8192
max_threads: 4
memory_limit_mb: 6000
safe_mode: true
Variable Description
HYPERCORE_API_KEY Bearer token for API authentication (optional)
RUST_LOG Log level: info, debug, trace

πŸ“‹ Project Status

We believe in radical transparency. Here's exactly where every component stands:

Component Status Notes
Inference Engine βœ… Production-ready Continuous batching, pressure handling, timeouts
TitanMem βœ… Working Experimental β€” honest benchmarks published
Knowledge Store βœ… Working FTS5 + vector search + dedup
Ingestion Pipeline βœ… Working Streaming, tree-sitter, batch embedding
Memory Extraction βœ… Working Keyword heuristics, code files skipped
Memory Commands βœ… 7 commands show, timeline, recall, patterns, explain, insight, sync
Feedback Collection βœ… Working 1–4 ratings persisted to SQLite
LLM Synthesis πŸ”Œ Wired Needs GGUF model to test
Eval Pipeline βœ… Real retrieval No hardcoded scores
LoRA Fine-tuning ❌ Not yet Honest warning in code

🀝 Contributing

We welcome contributions. Here's how:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

If you're not sure where to start, check the issues.


πŸ“„ License

MIT License β€” see LICENSE for details.


Built with πŸ¦€ Rust and ❀️ by Bala Vignesh S

⭐ Star us on GitHub Β· πŸ“¦ Latest Release Β· πŸ› Report a Bug

About

Production-grade, OpenAI-compatible LLM inference runtime built in Rust

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages