Skip to content

paiml/trueno-db

Repository files navigation

Trueno-DB

GPU-First Embedded Analytics with SIMD Fallback

Performance Comparison: GPU vs SIMD vs Scalar

CI Status Book Crates.io Coverage License Phase 1 Complete

GPU-first embedded analytics database with graceful degradation: GPU → SIMD → Scalar

Phase 1 MVP: Complete ✅

Status: 9/9 Core Tasks Complete | 156/156 Tests Passing | 92.64% Coverage ✅ (exceeds 90% target!)

Achievements:

  • ✅ Arrow/Parquet storage with morsel-based paging (CORE-001)
  • ✅ Cost-based backend dispatcher with 5x rule (CORE-002)
  • JIT WGSL compiler for kernel fusion (CORE-003)
  • ✅ GPU kernels: SUM, MIN, MAX, COUNT, AVG, fused filter+sum (CORE-004)
  • ✅ SIMD fallback via trueno (AVX-512/AVX2) (CORE-005)
  • ✅ Backend equivalence tests (GPU == SIMD == Scalar) (CORE-006)
  • SQL query interface: SELECT, WHERE, aggregations, ORDER BY, LIMIT (CORE-007)
  • ✅ PCIe transfer benchmarks (CORE-008)
  • ✅ Competitive benchmarking infrastructure (CORE-009)

See: docs/PHASE1_COMPLETE.md for full details

Performance

SIMD Aggregation Benchmarks (1M rows, AMD Threadripper 7960X):

Operation SIMD (µs) Scalar (µs) Speedup Status
SUM 228 634 2.78x ✅ Validated
MIN 228 1,048 4.60x ✅ Validated
MAX 228 257 1.13x ✅ Validated
AVG 228 634 2.78x ✅ Validated

Top-K Query Benchmark (1M rows, Top-10 selection):

Backend Technology Time Speedup Status
GPU Vulkan/Metal/DX12 2.5ms 50x Phase 2
SIMD AVX-512/AVX2/SSE2 12.8ms 10x ✅ Phase 1
Scalar Portable fallback 125ms 1x Baseline

Verified Claims (Red Team Audit ✅):

  • 95.24% line coverage ✅ (exceeds 90% target, GPU included!)
  • 1,100 property test scenarios
  • O(n log k) complexity proven
  • SIMD speedups: 1.13x-4.6x (empirically validated)
  • GPU tests: All kernels validated with real GPU hardware
  • Zero benchmark gaming

Try the Examples

# SQL query interface (NEW in v0.3.0) - DuckDB-like API
cargo run --example sql_query_interface --release

# Technical performance scaling (1K to 1M rows)
cargo run --example benchmark_shootout --release

# Gaming leaderboards (1M matches, 500K players)
cargo run --example gaming_leaderboards --release

# Stock market crashes (95 years, peer-reviewed data)
cargo run --example market_crashes --release

# GPU examples (requires --features gpu)
cargo run --example gpu_aggregations --features gpu --release
cargo run --example gpu_sales_analytics --features gpu --release

Output: <12ms queries on 1M rows with zero external dependencies

Installation

[dependencies]
# Default: SIMD-only (fast compile, small binary)
trueno-db = "0.3"

# With GPU support (opt-in, slower compile)
trueno-db = { version = "0.3", features = ["gpu"] }

Feature Flags

Feature Default Dependencies Compile Time Binary Size Use Case
simd ✅ Yes 12 ~18s -0.4 MB CI, lightweight deployments
gpu ❌ No 95 ~63s +3.8 MB Performance-critical production

Why SIMD is default: wgpu adds 67 transitive dependencies (+3.8 MB, +45s compile time). Most use cases don't need GPU acceleration.

Quick Start

use trueno_db::query::{QueryEngine, QueryExecutor};
use trueno_db::storage::StorageEngine;

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Load Parquet data into Arrow storage
    let storage = StorageEngine::load_parquet("data/events.parquet")?;

    // Initialize SQL query engine
    let engine = QueryEngine::new();
    let executor = QueryExecutor::new();

    // Parse and execute SQL query
    let plan = engine.parse(
        "SELECT COUNT(*), SUM(value), AVG(value) FROM events WHERE value > 100"
    )?;
    let result = executor.execute(&plan, &storage)?;

    Ok(())
}

Design Principles (Toyota Way Aligned)

  • Muda elimination: Kernel fusion minimizes PCIe transfers
  • Poka-Yoke safety: Out-of-core execution prevents VRAM OOM
  • Genchi Genbutsu: Physics-based cost model (5x rule for GPU dispatch)
  • Jidoka: Backend equivalence tests (GPU == SIMD == Scalar)

Features

  • Cost-based backend selection: Arithmetic intensity dispatch
  • Morsel-based paging: Out-of-core execution (128MB chunks)
  • JIT WGSL compiler: Kernel fusion for single-pass execution
  • GPU kernels: SUM, MIN, MAX, COUNT, AVG, fused filter+sum
  • SIMD fallback: Trueno integration (AVX-512/AVX2/SSE2)
  • SQL query interface: SELECT, WHERE, aggregations, ORDER BY, LIMIT
  • Async isolation: spawn_blocking for CPU-bound operations
  • 🚧 GROUP BY: Planned for Phase 2
  • 🚧 Hash JOIN: Planned for Phase 2
  • 🚧 WASM support: WebGPU + HTTP range requests (Phase 4)

Documentation

📖 Read the Book - Comprehensive guide to Trueno-DB

# Build documentation
make book

# Serve locally at http://localhost:3000
make book-serve

# Watch and rebuild on changes
make book-watch

The book covers:

  • Architecture and design principles
  • EXTREME TDD methodology
  • Toyota Way principles
  • Case studies (CORE-001, CORE-002)
  • Academic research foundation
  • Performance benchmarking

Development

# Build
make build

# Run tests (EXTREME TDD)
make test

# Quality gates
make quality-gate  # lint + test + coverage

# Backend equivalence tests
make test-equivalence

# Benchmarks
make bench-comparison

# Update trueno dependency
make update-trueno

Quality Gates (EXTREME TDD)

Every commit must:

  • ✅ Pass 100% of tests (cargo test --all-features)
  • ✅ Zero clippy warnings (cargo clippy -- -D warnings)
  • ✅ >90% code coverage (cargo llvm-cov)
  • ✅ TDG score ≥B+ (85/100) (pmat analyze tdg)
  • ✅ Mutation testing ≥80% kill rate (cargo mutants)

Architecture

See docs/specifications/db-spec-v1.md for full specification.

Backend Selection Logic

// Cost-based dispatch (Section 2.2 of spec)
// Rule: GPU only if compute_time > 5 * transfer_time
fn select_backend(data_size: usize, estimated_flops: f64) -> Backend {
    let pcie_transfer_ms = data_size as f64 / (32_000_000_000.0 / 1000.0);
    let gpu_compute_ms = estimate_gpu_compute(estimated_flops);

    if gpu_compute_ms > pcie_transfer_ms * 5.0 {
        Backend::Gpu
    } else {
        Backend::Simd  // Trueno fallback
    }
}

Academic Foundation

Built on peer-reviewed research:

  • Gregg & Hazelwood (2011): PCIe bus bottleneck analysis
  • Wu et al. (2012): Kernel fusion execution model
  • Funke et al. (2018): GPU paging for out-of-core workloads
  • Neumann (2011): JIT compilation for query execution
  • Abadi et al. (2008): Late materialization for column stores

See Section 8 for complete references.

Roadmap

Phase 1: Core Engine ✅ COMPLETE

  • ✅ Arrow storage backend (Parquet/CSV readers)
  • ✅ Morsel-based paging (128MB chunks)
  • ✅ Cost-based backend dispatcher (5x rule)
  • ✅ JIT WGSL compiler for kernel fusion
  • ✅ GPU kernels (SUM, AVG, COUNT, MIN/MAX, fused filter+sum)
  • ✅ SIMD fallback via Trueno (AVX-512/AVX2/SSE2)
  • ✅ SQL query interface (SELECT, WHERE, aggregations, ORDER BY, LIMIT)
  • ✅ Backend equivalence tests (GPU == SIMD == Scalar)
  • ✅ PCIe transfer benchmarks
  • ✅ Top-K selection (O(n log k) heap-based)

Phase 2: Multi-GPU

  • Local multi-GPU data partitioning
  • Work-stealing scheduler
  • Multi-GPU aggregation + reduce

Phase 3: Distribution

  • gRPC worker protocol
  • Distributed query execution
  • Fault tolerance

Phase 4: WASM

  • WASM build target
  • WebGPU backend
  • HTTP range request Parquet reader
  • Late materialization

License

MIT - Same as Trueno

Contributing

Follow EXTREME TDD:

  • All PRs require benchmarks + tests
  • Backend equivalence tests mandatory
  • Update CHANGELOG.md (keep-a-changelog format)

Contact

Authors: Pragmatic AI Labs Email: [email protected] Repository: https://github.com/paiml/trueno-db

About

GPU Database that supports SIMD and WASM built on top of trueno

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •