Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration of Ludwig and TabPFN #648

Open
annawiewer opened this issue Nov 14, 2024 · 3 comments
Open

Integration of Ludwig and TabPFN #648

annawiewer opened this issue Nov 14, 2024 · 3 comments
Labels
framework add For issues with a framework to be added

Comments

@annawiewer
Copy link

Hello! 😊

For my research, I'd like to integrate the AutoML framework Ludwig and the transformer TabPFN into the automlbenchmark. According to the instructions on the website, I'm starting by opening an issue here.

Since this is my first time adding frameworks to automlbenchmark, I’d appreciate any guidance or tips you could offer. Specifically, I'd like to confirm if it’s feasible to add these two frameworks.

Here are the GitHub repositories:

Ludwig: https://github.com/ludwig-ai/ludwig
TabPFN: https://github.com/automl/TabPFN

Thank you in advance for your help!

Best regards,
Anna

@PGijsbers PGijsbers added the framework add For issues with a framework to be added label Nov 15, 2024
@PGijsbers
Copy link
Collaborator

Hi! At this time we're building towards a way for people to release integrations independently, so for now I want to hold off adding new integrations directly into this repository. However, that doesn't mean that you can't use the benchmark to evaluate said frameworks! You can write an integration of the framework locally and use that to evaluate the frameworks. Documentation of that is here. If any of that is unclear (or doesn't work), just let me know and I'll have a look. Using existing integrations in the frameworks directory as a reference can also be useful.

I don't see anything in particular that would make either Ludwig AutoML or TabPFN unfit for use with the benchmark.

@annawiewer
Copy link
Author

annawiewer commented Nov 19, 2024

Hello @PGijsbers :) I have spent the last few days trying to integrate TabPFN and debug my errors. However, at the moment, I am unable to make further progress. I would greatly appreciate it if you could review my log and the exec.py file of TabPFN. Thank you for your help!

LOG:

[INFO] [amlb:15:25:05.859] Running benchmark `TabPFN` on `example` framework in `local` mode.
[INFO] [amlb.frameworks.definitions:15:25:05.911] Loading frameworks definitions from ['/home/devcontainers/automlbenchmark/resources/frameworks.yaml'].
[INFO] [amlb.resources:15:25:06.020] Loading benchmark constraint definitions from ['/home/devcontainers/automlbenchmark/resources/constraints.yaml'].
[INFO] [amlb.benchmarks.file:15:25:06.033] Loading benchmark definitions from /home/devcontainers/automlbenchmark/resources/benchmarks/example.yaml.
[INFO] [amlb.benchmark:15:25:06.106] Running 10 jobs
[INFO] [amlb.job:15:25:06.108] 
---------------------------------------------------
Starting job local.example.10m8c.credit-g.0.TabPFN.
[INFO] [amlb.benchmark:15:25:06.108] Assigning 8 cores (total=8) for new task credit-g.
[INFO] [amlb.benchmark:15:25:06.109] Assigning 542 MB (total=7866 MB) for new credit-g task.
[INFO] [amlb.utils.process:15:25:06.109] [MONITORING] [python [90167]] CPU Utilization: 22.6%
[INFO] [amlb.utils.process:15:25:06.112] [MONITORING] [python [90167]] Memory Usage: 67.1%
[INFO] [amlb.utils.process:15:25:06.113] [MONITORING] [python [90167]] Disk Usage: 6.6%
[INFO] [amlb.benchmark:15:25:06.120] Running task credit-g on framework TabPFN with config:
TaskConfig({'framework': 'TabPFN', 'framework_params': {'device': 'cpu'}, 'framework_version': '0.1.9', 'type': 'classification', 'name': 'credit-g', 'openml_task_id': 31, 'test_server': False, 'fold': 0, 'metric': 'auc', 'metrics': ['auc', 'logloss', 'acc', 'balacc'], 'seed': 369961566, 'job_timeout_seconds': 1200, 'max_runtime_seconds': 600, 'cores': 8, 'max_mem_size_mb': 542, 'min_vol_size_mb': -1, 'input_dir': '/home/devcontainers/.cache/openml', 'output_dir': '/home/devcontainers/automlbenchmark/stable/tabpfn.example.10m8c.local.20241119T142505', 'output_predictions_file': '/home/devcontainers/automlbenchmark/stable/tabpfn.example.10m8c.local.20241119T142505/predictions/credit-g/0/predictions.csv', 'tag': None, 'command': 'runbenchmark.py TabPFN example 10m8c -m local -p 1 -u ~/dev/null -o ./stable -Xmax_parallel_jobs=12 -Xaws.use_docker=False -Xaws.query_frequency_seconds=300', 'git_info': {'repo': 'https://github.com/openml/automlbenchmark.git', 'branch': 'master', 'commit': 'c3d051c3f9b7db9632a5931beffad9682a2265ac', 'tags': [], 'status': ['## master...origin/master [ahead 1]', ' M frameworks/shared/callee.py', ' M requirements.txt', ' M resources/benchmarks/small.yaml', ' M resources/constraints.yaml', ' M resources/frameworks.yaml', ' M runstable.sh', '?? frameworks/EvaML/', '?? frameworks/Ludwig/', '?? frameworks/TabPFN/', '?? stable/sapientml.validation.10m8c.local.20241118T143755/']}, 'measure_inference_time': False, 'ext': {}, 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], 'type_': 'binary', 'output_metadata_file': '/home/devcontainers/automlbenchmark/stable/tabpfn.example.10m8c.local.20241119T142505/predictions/credit-g/0/metadata.json'})
[INFO] [openml.datasets.dataset:15:25:06.168] pickle write credit-g
[INFO] [openml.datasets.dataset:15:25:06.171] pickle load data credit-g
[INFO] [amlb.utils.process:15:25:06.252] Running cmd `/home/devcontainers/automlbenchmark/frameworks/TabPFN/venv/bin/python -W ignore /home/devcontainers/automlbenchmark/frameworks/TabPFN/exec.py`
[INFO] [amlb.print:15:25:06.774] distutils in use: /home/devcontainers/automlbenchmark/frameworks/TabPFN/venv/lib/python3.9/site-packages/setuptools/_distutils/__init__.py
[INFO] [amlb.print:15:25:06.851] INFO:frameworks.shared.callee:Reading parameters from main process...
[INFO] [amlb.print:15:25:06.851] INFO:frameworks.shared.callee:Configuration loaded:
[INFO] [amlb.print:15:25:08.724] INFO:frameworks.shared.callee:  Framework Params: {'device': 'cpu'}
[INFO] [amlb.print:15:25:08.724] INFO:frameworks.shared.callee:  Job Timeout: 1200 seconds
[INFO] [amlb.print:15:25:08.725] INFO:frameworks.shared.callee:Starting model execution with timeout...
[INFO] [amlb.print:15:25:08.725] INFO:__main__:
[INFO] [amlb.print:15:25:08.742] **** Running TabPFN ****
[INFO] [amlb.print:15:25:08.742] 
[INFO] [amlb.print:15:25:08.743] INFO:__main__:Shapes of data: X_train: (900, 20), y_train: (900,), X_test: (100, 20), y_test: (100,)
[INFO] [amlb.print:15:25:09.055] INFO:__main__:Training TabPFN model...
[INFO] [amlb.print:15:25:09.540] /home/devcontainers/automlbenchmark/frameworks/TabPFN/venv/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
[INFO] [amlb.print:15:25:13.397]   return fn(*args, **kwargs)
[INFO] [amlb.print:15:25:13.397] /home/devcontainers/automlbenchmark/frameworks/TabPFN/venv/lib/python3.9/site-packages/torch/_dynamo/eval_frame.py:632: UserWarning: torch.utils.checkpoint: the use_reentrant parameter should be passed explicitly. In version 2.5 we will raise an exception if use_reentrant is not passed. use_reentrant=False is recommended, but if you need to preserve the current default behavior, you can pass use_reentrant=True. Refer to docs for more details on the differences between the two variants.
[INFO] [amlb.print:15:25:16.889] INFO:frameworks.shared.callee:Ensuring no subprocesses are left running...
[INFO] [amlb.print:15:25:16.890] INFO:frameworks.shared.callee:Results saved successfully.
[INFO] [amlb.print:15:25:16.891]   return fn(*args, **kwargs)
[INFO] [amlb.print:15:25:16.891] /usr/lib/python3.9/tempfile.py:969: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmprw51x3x7'>
[INFO] [amlb.print:15:25:17.355] 
[INFO] [amlb.print:15:25:17.355]   _warnings.warn(warn_message, ResourceWarning)
[INFO] [amlb.print:15:25:17.355] 
[INFO] [amlb.print:15:25:17.355] 
[INFO] [amlb.print:15:25:17.355] 
[ERROR] [amlb.benchmark:15:25:17.361] invalid literal for int() with base 10: 'good'
Traceback (most recent call last):
  File "/home/devcontainers/automlbenchmark/amlb/benchmark.py", line 605, in run
    meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
  File "/home/devcontainers/automlbenchmark/frameworks/TabPFN/__init__.py", line 42, in run
    return run_in_venv(
  File "/home/devcontainers/automlbenchmark/frameworks/shared/caller.py", line 168, in run_in_venv
    save_predictions(dataset=dataset,
  File "/home/devcontainers/automlbenchmark/amlb/results.py", line 327, in save_predictions
    preds = dataset.target.label_encoder.inverse_transform(predictions)
  File "/home/devcontainers/automlbenchmark/amlb/datautils.py", line 297, in inverse_transform
    vec = np.asarray(vec).astype(self.encoded_type, copy=False)
ValueError: invalid literal for int() with base 10: 'good'
[INFO] [amlb.results:15:25:17.363] Loading metadata from `/home/devcontainers/automlbenchmark/stable/tabpfn.example.10m8c.local.20241119T142505/predictions/credit-g/0/metadata.json`.
[INFO] [amlb.results:15:25:17.363] Metric scores: { 'acc': nan,
  'app_version': 'dev [master, c3d051c]',
  'auc': nan,
  'balacc': nan,
  'constraint': '10m8c',
  'duration': nan,
  'fold': 0,
  'framework': 'TabPFN',
  'id': 'openml.org/t/31',
  'info': "ValueError: invalid literal for int() with base 10: 'good'",
  'logloss': nan,
  'metric': 'auc',
  'mode': 'local',
  'models_count': nan,
  'params': "{'device': 'cpu'}",
  'predict_duration': nan,
  'result': nan,
  'seed': 369961566,
  'task': 'credit-g',
  'training_duration': nan,
  'type': 'binary',
  'utc': '2024-11-19T14:25:17',
  'version': '0.1.9'}
[INFO] [amlb.job:15:25:17.364] Job `local.example.10m8c.credit-g.0.TabPFN` executed in 11.248 seconds.
[INFO] [amlb.results:15:25:17.374] Scores saved to `/home/devcontainers/automlbenchmark/stable/tabpfn.example.10m8c.local.20241119T142505/scores/results.csv`.
[INFO] [amlb.results:15:25:17.411] Scores saved to `/home/devcontainers/automlbenchmark/stable/results.csv`.
[INFO] [amlb.job:15:25:17.411] 

exec.py of TabPFN:

`import os
import logging
import tempfile as tmp
import setuptools
import pandas as pd
import numpy as np
from frameworks.shared.callee import result

log = logging.getLogger(__name__)


os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'

# Debug: Sicherstellen, dass richtige distutils verwendet wird
os.environ['SETUPTOOLS_USE_DISTUTILS'] = 'stdlib'

log = logging.getLogger(__name__)

os.environ['JOBLIB_TEMP_FOLDER'] = tmp.gettempdir()
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'

# Globale Instanz des LabelEncoders
label_encoder = None

def preprocess_data(X, y=None, is_target=False):
    from sklearn.preprocessing import LabelEncoder
    global label_encoder

    if is_target:
        if label_encoder is None:
            label_encoder = LabelEncoder()
            y = label_encoder.fit_transform(y)
        else:
            y = label_encoder.transform(y)
        y = np.asarray(y).flatten()

    if isinstance(X, pd.DataFrame):
        for col in X.select_dtypes(include=["category", "object"]).columns:
            encoder = LabelEncoder()
            X[col] = encoder.fit_transform(X[col].astype(str))
        X = X.fillna(0)
    elif isinstance(X, np.ndarray):
        X = np.nan_to_num(X, nan=0)

    X = np.asarray(X).astype(np.float32)
    return X, y


def run(dataset, config):
    from tabpfn import TabPFNClassifier

    log.info(f"\n**** Running TabPFN ****\n")
    X_train, y_train = preprocess_data(dataset.train.X, dataset.train.y, is_target=True)
    X_test, y_test = preprocess_data(dataset.test.X, dataset.test.y, is_target=True)

    log.info(f"Shapes of data: X_train: {X_train.shape}, y_train: {y_train.shape}, X_test: {X_test.shape}, y_test: {y_test.shape}")

    classifier = TabPFNClassifier(device="cpu")
    log.info("Training TabPFN model...")
    classifier.fit(X_train, y_train)

    predictions = classifier.predict(X_test)
    probabilities = classifier.predict_proba(X_test)

    # Rücktransformation der Vorhersagen
    global label_encoder
    if label_encoder is not None:
        predictions = label_encoder.inverse_transform(predictions)
        y_test = label_encoder.inverse_transform(y_test)

    predictions_file = os.path.join(config.output_dir, "predictions.csv")
    pd.DataFrame({"y_true": y_test, "y_pred": predictions}).to_csv(predictions_file, index=False)

    return result(
        output_file=predictions_file,
        predictions=predictions,
        truth=y_test,
        probabilities=probabilities,
        target_is_encoded=True,
        models_count=1,
    )

if __name__ == '__main__':
    from frameworks.shared.callee import call_run
    call_run(run)`

@PGijsbers
Copy link
Collaborator

It's probably the fact that you set target_is_encoded=True, in your result() call, while from the log it looks like the labels are not encoded: 'info': "ValueError: invalid literal for int() with base 10: 'good'",

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
framework add For issues with a framework to be added
Projects
None yet
Development

No branches or pull requests

2 participants