-
-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration of Ludwig and TabPFN #648
Comments
Hi! At this time we're building towards a way for people to release integrations independently, so for now I want to hold off adding new integrations directly into this repository. However, that doesn't mean that you can't use the benchmark to evaluate said frameworks! You can write an integration of the framework locally and use that to evaluate the frameworks. Documentation of that is here. If any of that is unclear (or doesn't work), just let me know and I'll have a look. Using existing integrations in the I don't see anything in particular that would make either Ludwig AutoML or TabPFN unfit for use with the benchmark. |
Hello @PGijsbers :) I have spent the last few days trying to integrate TabPFN and debug my errors. However, at the moment, I am unable to make further progress. I would greatly appreciate it if you could review my log and the exec.py file of TabPFN. Thank you for your help! LOG:
exec.py of TabPFN: `import os
import logging
import tempfile as tmp
import setuptools
import pandas as pd
import numpy as np
from frameworks.shared.callee import result
log = logging.getLogger(__name__)
os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
# Debug: Sicherstellen, dass richtige distutils verwendet wird
os.environ['SETUPTOOLS_USE_DISTUTILS'] = 'stdlib'
log = logging.getLogger(__name__)
os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
# Globale Instanz des LabelEncoders
label_encoder = None
def preprocess_data(X, y=None, is_target=False):
from sklearn.preprocessing import LabelEncoder
global label_encoder
if is_target:
if label_encoder is None:
label_encoder = LabelEncoder()
y = label_encoder.fit_transform(y)
else:
y = label_encoder.transform(y)
y = np.asarray(y).flatten()
if isinstance(X, pd.DataFrame):
for col in X.select_dtypes(include=["category", "object"]).columns:
encoder = LabelEncoder()
X[col] = encoder.fit_transform(X[col].astype(str))
X = X.fillna(0)
elif isinstance(X, np.ndarray):
X = np.nan_to_num(X, nan=0)
X = np.asarray(X).astype(np.float32)
return X, y
def run(dataset, config):
from tabpfn import TabPFNClassifier
log.info(f"\n**** Running TabPFN ****\n")
X_train, y_train = preprocess_data(dataset.train.X, dataset.train.y, is_target=True)
X_test, y_test = preprocess_data(dataset.test.X, dataset.test.y, is_target=True)
log.info(f"Shapes of data: X_train: {X_train.shape}, y_train: {y_train.shape}, X_test: {X_test.shape}, y_test: {y_test.shape}")
classifier = TabPFNClassifier(device="cpu")
log.info("Training TabPFN model...")
classifier.fit(X_train, y_train)
predictions = classifier.predict(X_test)
probabilities = classifier.predict_proba(X_test)
# Rücktransformation der Vorhersagen
global label_encoder
if label_encoder is not None:
predictions = label_encoder.inverse_transform(predictions)
y_test = label_encoder.inverse_transform(y_test)
predictions_file = os.path.join(config.output_dir, "predictions.csv")
pd.DataFrame({"y_true": y_test, "y_pred": predictions}).to_csv(predictions_file, index=False)
return result(
output_file=predictions_file,
predictions=predictions,
truth=y_test,
probabilities=probabilities,
target_is_encoded=True,
models_count=1,
)
if __name__ == '__main__':
from frameworks.shared.callee import call_run
call_run(run)` |
It's probably the fact that you set |
Hello! 😊
For my research, I'd like to integrate the AutoML framework Ludwig and the transformer TabPFN into the automlbenchmark. According to the instructions on the website, I'm starting by opening an issue here.
Since this is my first time adding frameworks to automlbenchmark, I’d appreciate any guidance or tips you could offer. Specifically, I'd like to confirm if it’s feasible to add these two frameworks.
Here are the GitHub repositories:
Ludwig: https://github.com/ludwig-ai/ludwig
TabPFN: https://github.com/automl/TabPFN
Thank you in advance for your help!
Best regards,
Anna
The text was updated successfully, but these errors were encountered: