Skip to content

Files

Latest commit

5ea0319 · Aug 15, 2024

History

History
160 lines (119 loc) · 4.72 KB

README.md

File metadata and controls

160 lines (119 loc) · 4.72 KB

tsellm: Use LLMs in SQLite and DuckDB

Github PyPI Documentation Status Linkedin Github Sponsors pip installs Tests codecov License

tsellm is the easiest way to access LLMs from SQLite or DuckDB.

pip install tsellm

Prompts

cat <<EOF | tee >(sqlite3 prompts.sqlite3) | duckdb prompts.duckdb
CREATE TABLE prompts ( p TEXT);
INSERT INTO prompts VALUES('how are you?');
INSERT INTO prompts VALUES('is this real life?');
EOF
llm install llm-gpt4all
tsellm prompts.duckdb "select prompt(p, 'orca-mini-3b-gguf2-q4_0') from prompts"
tsellm prompts.sqlite3 "select prompt(p, 'orca-2-7b') from prompts"

Behind the scenes, tsellm is based on the beautiful llm library, so you can use any of its plugins:

Multiple Prompts

With a single query, you can easily access get prompt responses from different LLMs:

tsellm prompts.sqlite3 "
        select p,
        prompt(p, 'orca-2-7b'),
        prompt(p, 'orca-mini-3b-gguf2-q4_0'),
        embed(p, 'sentence-transformers/all-MiniLM-L12-v2') 
        from prompts"

Embeddings

llm install llm-sentence-transformers
llm sentence-transformers register all-MiniLM-L12-v2
llm install llm-embed-hazo # dummy embedding model for demonstration purposes
tsellm prompts.sqlite3 "select embed(p, 'sentence-transformers/all-MiniLM-L12-v2')"

JSON Embeddings Recursively

If you have JSON columns, you can embed these object recursively. That is, an embedding vector of floats will replace each text occurrence in the object.

cat <<EOF | tee >(sqlite3 prompts.sqlite3) | duckdb prompts.duckdb
CREATE TABLE people(d JSON);
INSERT INTO people (d) VALUES 
('{"name": "John Doe", "age": 30, "hobbies": ["reading", "biking"]}'),
('{"name": "Jane Smith", "age": 25, "hobbies": ["painting", "traveling"]}')
EOF

SQLite

tsellm prompts.sqlite3 "select json_embed(d, 'hazo') from people"

Output

('{"name": [4.0, 3.0,..., 0.0], "age": 30, "hobbies": [[7.0, 0.0,..., 0.0], [6.0, 0.0, ..., 0.0]]}',)
('{"name": [4.0, 5.0, ,..., 0.0], "age": 25, "hobbies": [[8.0, 0.0,..., 0.0], [9.0, 0.0,..., 0.0]]}',)

DuckDB

tsellm prompts.duckdb "select json_embed(d, 'hazo') from people"

Output

('{"name": [4.0, 3.0,..., 0.0], "age": 30, "hobbies": [[7.0, 0.0,..., 0.0], [6.0, 0.0, ..., 0.0]]}',)
('{"name": [4.0, 5.0, ,..., 0.0], "age": 25, "hobbies": [[8.0, 0.0,..., 0.0], [9.0, 0.0,..., 0.0]]}',)

Binary (BLOB) Embeddings

wget https://tselai.com/img/flo.jpg
sqlite3 images.sqlite3 <<EOF
CREATE TABLE images(name TEXT, type TEXT, img BLOB);
INSERT INTO images(name,type,img) VALUES('flo','jpg',readfile('flo.jpg'));
EOF
llm install llm-clip
tsellm images.sqlite3 "select embed(img, 'clip') from images"

Interactive Shell

If you don't provide an SQL query, you'll enter an interactive shell instead.

tsellm prompts.db

til

Installation

pip install tsellm

How

tsellm relies on the following facts:

  • SQLite is bundled with the standard Python library (import sqlite3)
  • Python 3.12 ships with a SQLite interactive shell
  • one can create Python-written user-defined functions to be used in SQLite queries (see create_function)
  • Simon Willison has gone through the process of creating the beautiful llm Python library and CLI

Development

pip install -e '.[test]'
pytest