This repository contains the implementation of the paper CARTE: Pretraining and Transfer for Tabular Learning.
CARTE is a pretrained model for tabular data by treating each table row as a star graph and training a graph transformer on top of this representation.
- CARTERegressor on Wine Poland dataset
- CARTEClassifier on Spotify dataset
Other datasets are available for testing: datasets
The library has been tested on Linux, MacOSX and Windows.
CARTE-AI can be installed from PyPI:
pip install carte-ai
After a correct installation, you should be able to import the module without errors:
import carte_ai
import pandas as pd
from carte_ai.data.load_data import *
num_train = 128 # Example: set the number of training groups/entities
random_state = 1 # Set a random seed for reproducibility
X_train, X_test, y_train, y_test = wina_pl(num_train, random_state)
print("Wina Poland dataset:", X_train.shape, X_test.shape)
The basic preparations are:
- preprocess raw data
- load the prepared data and configs; set train/test split
- generate graphs for each table entries (rows) using the Table2GraphTransformer
- create an estimator and make inference
import fasttext
from huggingface_hub import hf_hub_download
from carte_ai import Table2GraphTransformer
model_path = hf_hub_download(repo_id="hi-paris/fastText", filename="cc.en.300.bin")
preprocessor = Table2GraphTransformer(fasttext_model_path=model_path)
# Fit and transform the training data
X_train = preprocessor.fit_transform(X_train, y=y_train)
# Transform the test data
X_test = preprocessor.transform(X_test)
For learning, CARTE currently runs with the sklearn interface (fit/predict) and the process is:
- Define parameters
- Set the estimator
- Run 'fit' to train the model and 'predict' to make predictions
from carte_ai import CARTERegressor, CARTEClassifier
# Define some parameters
fixed_params = dict()
fixed_params["num_model"] = 10 # 10 models for the bagging strategy
fixed_params["disable_pbar"] = False # True if you want cleanness
fixed_params["random_state"] = 0
fixed_params["device"] = "cpu"
fixed_params["n_jobs"] = 10
fixed_params["pretrained_model_path"] = config_directory["pretrained_model"]
# Define the estimator and run fit/predict
estimator = CARTERegressor(**fixed_params) # CARTERegressor for Regression
estimator.fit(X=X_train, y=y_train)
y_pred = estimator.predict(X_test)
# Obtain the r2 score on predictions
score = r2_score(y_test, y_pred)
print(f"\nThe R2 score for CARTE:", "{:.4f}".format(score))
➡️ installation instructions setup paper
➡️ read the contributions guidelines
@article{kim2024carte,
title={CARTE: pretraining and transfer for tabular learning},
author={Kim, Myung Jun and Grinsztajn, L{\'e}o and Varoquaux, Ga{\"e}l},
journal={arXiv preprint arXiv:2402.16785},
year={2024}
}