Skip to content

Commit f85bedf

Browse files
authored
feat: add packages readme (#146)
* add: readme * update: content * update: package * update: readme
1 parent bbc41f1 commit f85bedf

File tree

6 files changed

+564
-48
lines changed

6 files changed

+564
-48
lines changed

meerkat-browser/README.md

Lines changed: 80 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,85 @@
1-
# meerkat-browser
1+
# @devrev/meerkat-browser
22

3-
This library was generated with [Nx](https://nx.dev).
3+
`@devrev/meerkat-browser` is a library for converting cube queries into SQL and executing them in a browser environment using [@duckdb/duckdb-wasm](https://github.com/duckdb/duckdb-wasm). It serves as a client-side query engine within the Meerkat ecosystem.
44

5-
## Building
5+
This package uses `@devrev/meerkat-core` to generate a DuckDB-compatible AST and `@duckdb/duckdb-wasm` to execute the resulting query against data sources available to the browser.
66

7-
Run `nx build meerkat-browser` to build the library.
7+
## Key Features
88

9-
## Running unit tests
9+
- **Cube to SQL Execution**: Translates cube queries into SQL and executes them in the browser.
10+
- **Browser Optimized**: Built to work seamlessly with `@duckdb/duckdb-wasm`.
11+
- **Client-Side Analytics**: Enables powerful, in-browser data analysis without a server round-trip.
1012

11-
Run `nx test meerkat-browser` to execute the unit tests via [Jest](https://jestjs.io).
13+
## Installation
14+
15+
```bash
16+
npm install @devrev/meerkat-browser @devrev/meerkat-core @duckdb/duckdb-wasm
17+
```
18+
19+
`@duckdb/duckdb-wasm` is a peer dependency and should be configured according to its documentation.
20+
21+
## Usage
22+
23+
Here's a example of how to convert a cube query into SQL and execute the query in the client side with duckdb-wasm.
24+
25+
```typescript
26+
import * as duckdb from '@duckdb/duckdb-wasm';
27+
import { cubeQueryToSQL } from '@devrev/meerkat-browser';
28+
import { Query, TableSchema } from '@devrev/meerkat-core';
29+
30+
async function main() {
31+
// 1. Initialize DuckDB-WASM
32+
const logger = new duckdb.ConsoleLogger();
33+
const worker = new Worker(duckdb.getJsDelivrWorker());
34+
const bundle = await duckdb.selectBundle(duckdb.getJsDelivrBundles());
35+
const db = new duckdb.AsyncDuckDB(logger, worker);
36+
await db.open(bundle);
37+
const connection = await db.connect();
38+
39+
// 2. Define your table schemas
40+
const tableSchemas: TableSchema[] = [
41+
{
42+
name: 'users',
43+
// The SQL could point to a registered file or another data source
44+
sql: 'SELECT * FROM users',
45+
columns: [
46+
{ name: 'id', type: 'INTEGER' },
47+
{ name: 'name', type: 'VARCHAR' },
48+
{ name: 'city', type: 'VARCHAR' },
49+
{ name: 'signed_up_at', type: 'TIMESTAMP' },
50+
],
51+
},
52+
];
53+
54+
// 3. Define your Cube query
55+
const query: Query = {
56+
measures: ['users.count'],
57+
dimensions: ['users.city'],
58+
filters: [
59+
{
60+
member: 'users.city',
61+
operator: 'equals',
62+
values: ['New York'],
63+
},
64+
],
65+
limit: 100,
66+
};
67+
68+
// 4. Convert the query to SQL
69+
const sqlQuery = await cubeQueryToSQL({
70+
connection,
71+
query,
72+
tableSchemas,
73+
});
74+
75+
// 5. You can now execute the generated SQL query with DuckDB
76+
const result = await connection.query(sqlQuery);
77+
78+
console.log(
79+
'Query Results:',
80+
result.toArray().map((row) => row.toJSON())
81+
);
82+
}
83+
84+
main();
85+
```

meerkat-core/README.md

Lines changed: 70 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,75 @@
1-
# meerkat-core
1+
# @devrev/meerkat-core
22

3-
This library was generated with [Nx](https://nx.dev).
3+
`@devrev/meerkat-core` is the foundational library for the Meerkat ecosystem, a TypeScript SDK that seamlessly translates Cube-like queries into DuckDB Abstract Syntax Trees (AST). It provides the core logic for query transformation, designed to be environment-agnostic, running in both Node.js and browser environments.
44

5-
## Building
5+
This package focuses exclusively on generating a DuckDB-compatible AST from a JSON-based query object. It does not handle query execution, which is the responsibility of environment-specific packages like `@devrev/meerkat-node` and `@devrev/meerkat-browser`.
66

7-
Run `nx build meerkat-core` to build the library.
7+
## Key Features
88

9-
## Running unit tests
9+
- **Cube-to-AST Transformation**: Converts Cube-style JSON queries into DuckDB-compatible SQL ASTs.
10+
- **Environment Agnostic**: Runs in both Node.js and browser environments.
11+
- **Type-Safe**: Provides strong TypeScript definitions for queries, schemas, and filters.
12+
- **Advanced Filtering and Joins**: Supports complex filters, logical operators, and multi-table joins.
13+
- **Extensible by Design**: Leverages DuckDB's native JSON serialization, avoiding the limitations of traditional query builders.
1014

11-
Run `nx test meerkat-core` to execute the unit tests via [Jest](https://jestjs.io).
15+
## Installation
16+
17+
```bash
18+
npm install @devrev/meerkat-core
19+
```
20+
21+
## Core Concepts
22+
23+
`meerkat-core` revolves around two main objects:
24+
25+
1. **`Query`**: A JSON object that defines your analytics request. It specifies measures, dimensions, filters, and ordering.
26+
2. **`TableSchema`**: Defines the structure of your data tables, including columns, measures, dimensions, and joins.
27+
28+
The library uses these objects to generate a DuckDB AST. This AST can then be passed to an execution engine.
29+
30+
## Usage
31+
32+
Here's how to transform a Cube-style query into a DuckDB AST:
33+
34+
```typescript
35+
import { cubeToDuckdbAST, Query, TableSchema } from '@devrev/meerkat-core';
36+
37+
// 1. Define the schema for your table
38+
const schema: TableSchema = {
39+
name: 'users',
40+
sql: 'SELECT * FROM users',
41+
columns: [
42+
{ name: 'id', type: 'INTEGER' },
43+
{ name: 'name', type: 'VARCHAR' },
44+
{ name: 'city', type: 'VARCHAR' },
45+
{ name: 'signed_up_at', type: 'TIMESTAMP' },
46+
],
47+
};
48+
49+
// 2. Define your query
50+
const query: Query = {
51+
measures: ['users.count'],
52+
dimensions: ['users.city'],
53+
filters: [
54+
{
55+
member: 'users.city',
56+
operator: 'equals',
57+
values: ['New York'],
58+
},
59+
],
60+
limit: 100,
61+
};
62+
63+
// 3. Generate the DuckDB AST
64+
const ast = cubeToDuckdbAST(query, schema);
65+
66+
// The `ast` can now be deserialized into a SQL string for execution.
67+
console.log(JSON.stringify(ast, null, 2));
68+
```
69+
70+
## Ecosystem
71+
72+
`meerkat-core` is the foundation for:
73+
74+
- **`@devrev/meerkat-node`**: For server-side analytics in Node.js with `@duckdb/node-api`.
75+
- **`@devrev/meerkat-browser`**: For client-side analytics in the browser with `@duckdb/duckdb-wasm`.

meerkat-dbm/README.md

Lines changed: 209 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,214 @@
1-
# meerkat-dbm
1+
# @devrev/meerkat-dbm
22

3-
This library was generated with [Nx](https://nx.dev).
3+
`@devrev/meerkat-dbm` is a browser-first database management layer built on [duckdb-wasm](https://github.com/duckdb/duckdb-wasm). It orchestrates query execution, manages DuckDB instances, caches files, persists data in browser storage, and optimizes memory usage to enable robust, high-performance data processing in web applications.
44

5-
## Building
5+
It's designed to bring the power of analytical SQL to the browser without compromising application stability or user experience. Whether you're building a data-intensive dashboard, an interactive reporting tool, or an offline-first application, Meerkat DBM provides the foundation you need.
66

7-
Run `nx build meerkat-dbm` to build the library.
7+
## Architecture
88

9-
## Running unit tests
9+
Meerkat DBM is composed of several key components that work together to manage data and execute queries in the browser:
1010

11-
Run `nx test meerkat-dbm` to execute the unit tests via [Jest](https://jestjs.io).
11+
- **DBM (Database Manager)**: The central orchestrator. It receives queries, manages the execution lifecycle, and coordinates with other components.
12+
- **FileManager**: Handles all aspects of data storage and retrieval. It can manage data in-memory or persist it to IndexedDB.
13+
- **InstanceManager**: A user-implemented component responsible for creating, managing, and terminating `duckdb-wasm` instances.
14+
- **DuckDB Instances**: The underlying `duckdb-wasm` engines where queries are executed, running in the main thread or in iFrames for parallelism.
15+
16+
This modular design provides a clear separation of concerns for managing complex data workflows in the browser.
17+
18+
## Why Meerkat DBM?
19+
20+
While `duckdb-wasm` is incredibly powerful, using it directly in a complex web application can be challenging. Meerkat DBM provides a structured, production-ready layer that solves common problems:
21+
22+
- **🧠 Memory Safety**: Prevents Out-Of-Memory (OOM) errors by managing query queues and memory swapping, ensuring your app remains stable even with large datasets.
23+
- **💾 Persistence**: Offers seamless IndexedDB storage, allowing data to persist across browser sessions.
24+
- **🗂️ Advanced File Management**: Simplifies handling of various file formats (Parquet, JSON, remote URLs) with intelligent caching and partitioning.
25+
- **⚡ Parallel Processing**: Unlocks high-performance analytics with an optional iframe-based architecture for parallel query execution, preventing UI freezes.
26+
27+
## Key Features
28+
29+
### 🚀 Database Management
30+
31+
- **Instance Management**: Automated lifecycle management for DuckDB instances.
32+
- **Connection Pooling**: Efficient management of database connections.
33+
- **Query Queueing**: Intelligent scheduling of queries for sequential or parallel execution.
34+
- **Table Locking**: Ensures thread-safe table operations during concurrent access.
35+
36+
### 📂 File Management
37+
38+
- **Multiple Formats**: Native support for Parquet, JSON files.
39+
- **Bulk Operations**: High-performance APIs for registering and processing files in bulk.
40+
- **Partitioning**: Support for table partitioning to efficiently manage and query large datasets.
41+
- **Metadata Handling**: Rich metadata support for tables and files.
42+
- **Multiple Storage Modes**: Flexible storage options, including in-memory and IndexedDB.
43+
44+
## Installation
45+
46+
```bash
47+
npm install @devrev/meerkat-dbm @duckdb/duckdb-wasm
48+
```
49+
50+
## Usage
51+
52+
### 1. Implement the InstanceManager
53+
54+
Meerkat DBM requires you to provide an `InstanceManager`. This decouples the library from a specific `duckdb-wasm` version, giving you full control over its instantiation and configuration.
55+
56+
```typescript
57+
// src/instance-manager.ts
58+
import * as duckdb from '@duckdb/duckdb-wasm';
59+
import { InstanceManagerType } from '@devrev/meerkat-dbm';
60+
61+
// Select the desired DuckDB bundle
62+
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
63+
64+
export class InstanceManager implements InstanceManagerType {
65+
private db: duckdb.AsyncDuckDB | null = null;
66+
67+
private async initDB(): Promise<duckdb.AsyncDuckDB> {
68+
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
69+
70+
const worker_url = URL.createObjectURL(new Blob([`importScripts("${bundle.mainWorker!}");`], { type: 'text/javascript' }));
71+
72+
const worker = new Worker(worker_url);
73+
const logger = { log: (msg: any) => console.log(msg) };
74+
const db = new duckdb.AsyncDuckDB(logger, worker);
75+
76+
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
77+
78+
URL.revokeObjectURL(worker_url);
79+
return db;
80+
}
81+
82+
async getDB(): Promise<duckdb.AsyncDuckDB> {
83+
if (!this.db) {
84+
this.db = await this.initDB();
85+
}
86+
return this.db;
87+
}
88+
89+
async terminateDB(): Promise<void> {
90+
if (this.db) {
91+
await this.db.terminate();
92+
this.db = null;
93+
}
94+
}
95+
}
96+
```
97+
98+
### 2. Example: Sequential Queries with Persistent Storage
99+
100+
This example uses the `DBM` with an `IndexedDBFileManager` for safe, sequential query execution and data persistence across browser sessions.
101+
102+
```typescript
103+
import { DBM, IndexedDBFileManager } from '@devrev/meerkat-dbm';
104+
import { InstanceManager } from './instance-manager';
105+
106+
// 1. Create the managers
107+
const instanceManager = new InstanceManager();
108+
const fileManager = new IndexedDBFileManager({
109+
instanceManager,
110+
// This function is called by Meerkat to fetch file data when needed
111+
fetchTableFileBuffers: async (tableName) => {
112+
// In a real app, you would fetch data from a indexdb
113+
return [];
114+
},
115+
});
116+
117+
// 2. Create the DBM instance
118+
const dbm = new DBM({
119+
instanceManager,
120+
fileManager,
121+
onEvent: (event) => console.info('DBM Event:', event),
122+
options: {
123+
// Automatically shut down the DuckDB instance after 5s of inactivity
124+
shutdownInactiveTime: 5000,
125+
},
126+
});
127+
128+
// 3. Register data
129+
await fileManager.registerJSON({
130+
tableName: 'sales',
131+
fileName: 'sales.json',
132+
json: [
133+
{ id: 1, product: 'Laptop', amount: 1200 },
134+
{ id: 2, product: 'Mouse', amount: 25 },
135+
{ id: 3, product: 'Keyboard', amount: 75 },
136+
],
137+
});
138+
139+
// 4. Run a query
140+
const results = await dbm.query('SELECT * FROM sales WHERE amount > 50');
141+
console.log(results);
142+
```
143+
144+
### 3. Example: Parallel Queries with IFrame Runners
145+
146+
This setup uses `DBMParallel` and `ParallelIndexedDBFileManager` for maximum performance, executing queries in parallel across multiple iframe-based DuckDB instances.
147+
148+
```typescript
149+
import { DBMParallel, IFrameRunnerManager, ParallelIndexedDBFileManager } from '@devrev/meerkat-dbm';
150+
import log from 'loglevel';
151+
import { InstanceManager } from './instance-manager';
152+
153+
// 1. Create instance and file managers
154+
const instanceManager = new InstanceManager();
155+
const fileManager = new ParallelIndexedDBFileManager({
156+
instanceManager,
157+
fetchTableFileBuffers: async (table) => [],
158+
logger: log,
159+
});
160+
161+
// 2. Set up the iframe runner manager for parallel execution
162+
const iframeManager = new IFrameRunnerManager({
163+
// URL to the runner HTML file that hosts the DuckDB instance
164+
runnerURL: 'http://localhost:4204/runner/indexeddb-runner.html',
165+
origin: 'http://localhost:4204',
166+
totalRunners: 4, // Number of parallel iframes
167+
fetchTableFileBuffers: async (table) => [],
168+
logger: log,
169+
});
170+
171+
// 3. Create the parallel DBM instance
172+
const parallelDBM = new DBMParallel({
173+
instanceManager,
174+
fileManager,
175+
iFrameRunnerManager: iframeManager,
176+
logger: log,
177+
options: {
178+
shutdownInactiveTime: 10000,
179+
},
180+
});
181+
182+
// 4. Register data
183+
await fileManager.bulkRegisterJSON([
184+
{
185+
tableName: 'transactions',
186+
fileName: 'transactions.json',
187+
json: [
188+
{ id: 1, product_id: 101, amount: 1200 },
189+
{ id: 2, product_id: 102, amount: 25 },
190+
],
191+
},
192+
{
193+
tableName: 'products',
194+
fileName: 'products.json',
195+
json: [
196+
{ id: 101, name: 'Laptop', category: 'Electronics' },
197+
{ id: 102, name: 'Mouse', category: 'Accessories' },
198+
],
199+
},
200+
]);
201+
202+
// 5. Execute queries in parallel
203+
const results = await Promise.all([
204+
parallelDBM.query('SELECT * FROM transactions WHERE amount > 100'),
205+
parallelDBM.query(`
206+
SELECT p.category, COUNT(*) as product_count
207+
FROM transactions t
208+
JOIN products p ON t.product_id = p.id
209+
GROUP BY p.category
210+
`),
211+
]);
212+
213+
console.log('Query Results', results);
214+
```

0 commit comments

Comments
 (0)