|
1 |
| -# meerkat-dbm |
| 1 | +# @devrev/meerkat-dbm |
2 | 2 |
|
3 |
| -This library was generated with [Nx](https://nx.dev). |
| 3 | +`@devrev/meerkat-dbm` is a browser-first database management layer built on [duckdb-wasm](https://github.com/duckdb/duckdb-wasm). It orchestrates query execution, manages DuckDB instances, caches files, persists data in browser storage, and optimizes memory usage to enable robust, high-performance data processing in web applications. |
4 | 4 |
|
5 |
| -## Building |
| 5 | +It's designed to bring the power of analytical SQL to the browser without compromising application stability or user experience. Whether you're building a data-intensive dashboard, an interactive reporting tool, or an offline-first application, Meerkat DBM provides the foundation you need. |
6 | 6 |
|
7 |
| -Run `nx build meerkat-dbm` to build the library. |
| 7 | +## Architecture |
8 | 8 |
|
9 |
| -## Running unit tests |
| 9 | +Meerkat DBM is composed of several key components that work together to manage data and execute queries in the browser: |
10 | 10 |
|
11 |
| -Run `nx test meerkat-dbm` to execute the unit tests via [Jest](https://jestjs.io). |
| 11 | +- **DBM (Database Manager)**: The central orchestrator. It receives queries, manages the execution lifecycle, and coordinates with other components. |
| 12 | +- **FileManager**: Handles all aspects of data storage and retrieval. It can manage data in-memory or persist it to IndexedDB. |
| 13 | +- **InstanceManager**: A user-implemented component responsible for creating, managing, and terminating `duckdb-wasm` instances. |
| 14 | +- **DuckDB Instances**: The underlying `duckdb-wasm` engines where queries are executed, running in the main thread or in iFrames for parallelism. |
| 15 | + |
| 16 | +This modular design provides a clear separation of concerns for managing complex data workflows in the browser. |
| 17 | + |
| 18 | +## Why Meerkat DBM? |
| 19 | + |
| 20 | +While `duckdb-wasm` is incredibly powerful, using it directly in a complex web application can be challenging. Meerkat DBM provides a structured, production-ready layer that solves common problems: |
| 21 | + |
| 22 | +- **🧠 Memory Safety**: Prevents Out-Of-Memory (OOM) errors by managing query queues and memory swapping, ensuring your app remains stable even with large datasets. |
| 23 | +- **💾 Persistence**: Offers seamless IndexedDB storage, allowing data to persist across browser sessions. |
| 24 | +- **🗂️ Advanced File Management**: Simplifies handling of various file formats (Parquet, JSON, remote URLs) with intelligent caching and partitioning. |
| 25 | +- **⚡ Parallel Processing**: Unlocks high-performance analytics with an optional iframe-based architecture for parallel query execution, preventing UI freezes. |
| 26 | + |
| 27 | +## Key Features |
| 28 | + |
| 29 | +### 🚀 Database Management |
| 30 | + |
| 31 | +- **Instance Management**: Automated lifecycle management for DuckDB instances. |
| 32 | +- **Connection Pooling**: Efficient management of database connections. |
| 33 | +- **Query Queueing**: Intelligent scheduling of queries for sequential or parallel execution. |
| 34 | +- **Table Locking**: Ensures thread-safe table operations during concurrent access. |
| 35 | + |
| 36 | +### 📂 File Management |
| 37 | + |
| 38 | +- **Multiple Formats**: Native support for Parquet, JSON files. |
| 39 | +- **Bulk Operations**: High-performance APIs for registering and processing files in bulk. |
| 40 | +- **Partitioning**: Support for table partitioning to efficiently manage and query large datasets. |
| 41 | +- **Metadata Handling**: Rich metadata support for tables and files. |
| 42 | +- **Multiple Storage Modes**: Flexible storage options, including in-memory and IndexedDB. |
| 43 | + |
| 44 | +## Installation |
| 45 | + |
| 46 | +```bash |
| 47 | +npm install @devrev/meerkat-dbm @duckdb/duckdb-wasm |
| 48 | +``` |
| 49 | + |
| 50 | +## Usage |
| 51 | + |
| 52 | +### 1. Implement the InstanceManager |
| 53 | + |
| 54 | +Meerkat DBM requires you to provide an `InstanceManager`. This decouples the library from a specific `duckdb-wasm` version, giving you full control over its instantiation and configuration. |
| 55 | + |
| 56 | +```typescript |
| 57 | +// src/instance-manager.ts |
| 58 | +import * as duckdb from '@duckdb/duckdb-wasm'; |
| 59 | +import { InstanceManagerType } from '@devrev/meerkat-dbm'; |
| 60 | + |
| 61 | +// Select the desired DuckDB bundle |
| 62 | +const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles(); |
| 63 | + |
| 64 | +export class InstanceManager implements InstanceManagerType { |
| 65 | + private db: duckdb.AsyncDuckDB | null = null; |
| 66 | + |
| 67 | + private async initDB(): Promise<duckdb.AsyncDuckDB> { |
| 68 | + const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES); |
| 69 | + |
| 70 | + const worker_url = URL.createObjectURL(new Blob([`importScripts("${bundle.mainWorker!}");`], { type: 'text/javascript' })); |
| 71 | + |
| 72 | + const worker = new Worker(worker_url); |
| 73 | + const logger = { log: (msg: any) => console.log(msg) }; |
| 74 | + const db = new duckdb.AsyncDuckDB(logger, worker); |
| 75 | + |
| 76 | + await db.instantiate(bundle.mainModule, bundle.pthreadWorker); |
| 77 | + |
| 78 | + URL.revokeObjectURL(worker_url); |
| 79 | + return db; |
| 80 | + } |
| 81 | + |
| 82 | + async getDB(): Promise<duckdb.AsyncDuckDB> { |
| 83 | + if (!this.db) { |
| 84 | + this.db = await this.initDB(); |
| 85 | + } |
| 86 | + return this.db; |
| 87 | + } |
| 88 | + |
| 89 | + async terminateDB(): Promise<void> { |
| 90 | + if (this.db) { |
| 91 | + await this.db.terminate(); |
| 92 | + this.db = null; |
| 93 | + } |
| 94 | + } |
| 95 | +} |
| 96 | +``` |
| 97 | + |
| 98 | +### 2. Example: Sequential Queries with Persistent Storage |
| 99 | + |
| 100 | +This example uses the `DBM` with an `IndexedDBFileManager` for safe, sequential query execution and data persistence across browser sessions. |
| 101 | + |
| 102 | +```typescript |
| 103 | +import { DBM, IndexedDBFileManager } from '@devrev/meerkat-dbm'; |
| 104 | +import { InstanceManager } from './instance-manager'; |
| 105 | + |
| 106 | +// 1. Create the managers |
| 107 | +const instanceManager = new InstanceManager(); |
| 108 | +const fileManager = new IndexedDBFileManager({ |
| 109 | + instanceManager, |
| 110 | + // This function is called by Meerkat to fetch file data when needed |
| 111 | + fetchTableFileBuffers: async (tableName) => { |
| 112 | + // In a real app, you would fetch data from a indexdb |
| 113 | + return []; |
| 114 | + }, |
| 115 | +}); |
| 116 | + |
| 117 | +// 2. Create the DBM instance |
| 118 | +const dbm = new DBM({ |
| 119 | + instanceManager, |
| 120 | + fileManager, |
| 121 | + onEvent: (event) => console.info('DBM Event:', event), |
| 122 | + options: { |
| 123 | + // Automatically shut down the DuckDB instance after 5s of inactivity |
| 124 | + shutdownInactiveTime: 5000, |
| 125 | + }, |
| 126 | +}); |
| 127 | + |
| 128 | +// 3. Register data |
| 129 | +await fileManager.registerJSON({ |
| 130 | + tableName: 'sales', |
| 131 | + fileName: 'sales.json', |
| 132 | + json: [ |
| 133 | + { id: 1, product: 'Laptop', amount: 1200 }, |
| 134 | + { id: 2, product: 'Mouse', amount: 25 }, |
| 135 | + { id: 3, product: 'Keyboard', amount: 75 }, |
| 136 | + ], |
| 137 | +}); |
| 138 | + |
| 139 | +// 4. Run a query |
| 140 | +const results = await dbm.query('SELECT * FROM sales WHERE amount > 50'); |
| 141 | +console.log(results); |
| 142 | +``` |
| 143 | + |
| 144 | +### 3. Example: Parallel Queries with IFrame Runners |
| 145 | + |
| 146 | +This setup uses `DBMParallel` and `ParallelIndexedDBFileManager` for maximum performance, executing queries in parallel across multiple iframe-based DuckDB instances. |
| 147 | + |
| 148 | +```typescript |
| 149 | +import { DBMParallel, IFrameRunnerManager, ParallelIndexedDBFileManager } from '@devrev/meerkat-dbm'; |
| 150 | +import log from 'loglevel'; |
| 151 | +import { InstanceManager } from './instance-manager'; |
| 152 | + |
| 153 | +// 1. Create instance and file managers |
| 154 | +const instanceManager = new InstanceManager(); |
| 155 | +const fileManager = new ParallelIndexedDBFileManager({ |
| 156 | + instanceManager, |
| 157 | + fetchTableFileBuffers: async (table) => [], |
| 158 | + logger: log, |
| 159 | +}); |
| 160 | + |
| 161 | +// 2. Set up the iframe runner manager for parallel execution |
| 162 | +const iframeManager = new IFrameRunnerManager({ |
| 163 | + // URL to the runner HTML file that hosts the DuckDB instance |
| 164 | + runnerURL: 'http://localhost:4204/runner/indexeddb-runner.html', |
| 165 | + origin: 'http://localhost:4204', |
| 166 | + totalRunners: 4, // Number of parallel iframes |
| 167 | + fetchTableFileBuffers: async (table) => [], |
| 168 | + logger: log, |
| 169 | +}); |
| 170 | + |
| 171 | +// 3. Create the parallel DBM instance |
| 172 | +const parallelDBM = new DBMParallel({ |
| 173 | + instanceManager, |
| 174 | + fileManager, |
| 175 | + iFrameRunnerManager: iframeManager, |
| 176 | + logger: log, |
| 177 | + options: { |
| 178 | + shutdownInactiveTime: 10000, |
| 179 | + }, |
| 180 | +}); |
| 181 | + |
| 182 | +// 4. Register data |
| 183 | +await fileManager.bulkRegisterJSON([ |
| 184 | + { |
| 185 | + tableName: 'transactions', |
| 186 | + fileName: 'transactions.json', |
| 187 | + json: [ |
| 188 | + { id: 1, product_id: 101, amount: 1200 }, |
| 189 | + { id: 2, product_id: 102, amount: 25 }, |
| 190 | + ], |
| 191 | + }, |
| 192 | + { |
| 193 | + tableName: 'products', |
| 194 | + fileName: 'products.json', |
| 195 | + json: [ |
| 196 | + { id: 101, name: 'Laptop', category: 'Electronics' }, |
| 197 | + { id: 102, name: 'Mouse', category: 'Accessories' }, |
| 198 | + ], |
| 199 | + }, |
| 200 | +]); |
| 201 | + |
| 202 | +// 5. Execute queries in parallel |
| 203 | +const results = await Promise.all([ |
| 204 | + parallelDBM.query('SELECT * FROM transactions WHERE amount > 100'), |
| 205 | + parallelDBM.query(` |
| 206 | + SELECT p.category, COUNT(*) as product_count |
| 207 | + FROM transactions t |
| 208 | + JOIN products p ON t.product_id = p.id |
| 209 | + GROUP BY p.category |
| 210 | + `), |
| 211 | +]); |
| 212 | + |
| 213 | +console.log('Query Results', results); |
| 214 | +``` |
0 commit comments