GitHub - mattismegevand/sqlgbm: SQL-based inference for Gradient Boosting Models

sqlgbm

⚠️ Warning: This library is in a very early development stage. The API and functionality will improve significantly over time. Not ready for production use yet.

sqlgbm is a Python library that converts tree-based machine learning models into SQL queries. This allows you to deploy your ML models directly in your database without any additional infrastructure.

documentation

You can find the documentation here.

installation

pip install sqlgbm

overview

sqlgbm takes your trained tree-based models and generates SQL code that reproduces the model's predictions. This enables you to:

Run predictions directly in your database
Eliminate latency from API calls between your database and ML serving infrastructure
Simplify your production architecture by removing additional serving components

supported models

Currently supported models:

LightGBM
XGBoost

usage

basic example

from sqlgbm import SQLGBM
import lightgbm as lgb
import pandas as pd

# Load titanic dataset
titanic = pd.read_csv('titanic.csv')
features = ['pclass', 'sex', 'age', 'fare']
X = titanic[features]
X['sex'] = X['sex'].astype('category')
y = titanic['survived']

# Train model
clf = lgb.LGBMClassifier(n_estimators=3, max_depth=3)
clf.fit(X, y, categorical_feature=['sex'])

# Convert to SQL
sqlgbm = SQLGBM(clf, cat_features=['sex'])
sql = sqlgbm.generate_query('titanic', 'probability')

print(sql)

xgboost example

import xgboost as xgb
from sqlgbm import SQLGBM

# Prepare data and train model
# ...

# Convert XGBoost model to SQL
model = xgb.XGBClassifier(n_estimators=3, max_depth=3, base_score=0.5)
model.fit(X, y)

sqlgbm = SQLGBM(model, X=X)  # X used to infer categorical features
sql = sqlgbm.generate_query('my_table', 'all')

output types

sqlgbm supports different output formats through the output_type parameter:

raw: Returns the raw model output
probability: Returns the probability (after sigmoid transformation)
prediction: Returns the binary prediction (0 or 1) based on a 0.5 threshold
all: Returns all three outputs

Additional options:

fast_sigmoid: Use a faster approximation of the sigmoid function

roadmap

Add support for CatBoost
Optimize SQL generation for large models
Add support for multiclass classification
Provide specialized optimizations for different database engines

license

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
docs		docs
sqlgbm		sqlgbm
tests		tests
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
.ruff.toml		.ruff.toml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

sqlgbm

documentation

installation

overview

supported models

usage

basic example

xgboost example

output types

roadmap

license

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

sqlgbm

documentation

installation

overview

supported models

usage

basic example

xgboost example

output types

roadmap

license

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages