-
-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Autosklearn Issue could not convert string to float #662
Comments
Can you let me know the command you used, which Python version you have, and which dependencies are installed in your environment, as well as the virtual environment of auto-sklearn? If you are running locally, the auto-sklearn environment is located it |
Thanks for the reply. Below you can find the specific configurations. In general, I would like to understand if I need to modify or preconfigure the dataset on OpenML in a certain way that every framework is able to run the dataset (target is already nominal). For example, I have uploaded several datasets, and it happens that the OpenML task ID 362234 works for MLJAR Supervised but neither for Auto-sklearn nor LightAutoML. Since I am setting up an experiment, I just want to understand on which end I need to adjust something. This would be quite important for the discussion section of the results. Thanks for your support. Command: runstable.sh:
Python Version: (venv) devcontainers@DESKTOP-OC2G953:~/automlbenchmark$ pip list boto3 1.26.98 (venv) devcontainers@DESKTOP-OC2G953:~/automlbenchmark/frameworks/autosklearn$ pip list argon2-cffi 23.1.0 |
It looks like you might also be calling The issue is that the original OpenML dataset has text features, and the automl benchmark was never tested for that (the original suites have only numerical and categorical data - at the time not all frameworks supported text data). We do want to support text features however, so we might be able to dedicate some time to resolve this systematically. In the mean time, I believe that removing lines asking for the encoded data for autosklearn (i.e., lines 17,18,23,24 in the |
Hello,
I have tried to run autosklearn with the task ID 362234 but I constantly receive the error
"ValueError: could not convert string to float: 'IBRDB0050'":
Could you help me how to fix this please?
Thanks!
`Starting job local.medium.2m8c.Loan_Type.9.autosklearn.
Assigning 8 cores (total=8) for new task Loan_Type.
Assigning 542 MB (total=7866 MB) for new Loan_Type task.
Running task Loan_Type on framework autosklearn with config:
TaskConfig({'framework': 'autosklearn', 'framework_params': {'_save_artifacts': ['models', 'debug_as_files'], 'n_jobs': 1}, 'framework_version': 'stable', 'type': 'classification', 'name': 'Loan_Type', 'openml_task_id': 362234, 'test_server': False, 'fold': 9, 'metric': 'logloss', 'metrics': ['logloss', 'acc', 'balacc'], 'seed': 349662246, 'job_timeout_seconds': 1200, 'max_runtime_seconds': 600, 'cores': 8, 'max_mem_size_mb': 542, 'min_vol_size_mb': -1, 'input_dir': '/home/devcontainers/.cache/openml', 'output_dir': '/home/devcontainers/automlbenchmark/stable/autosklearn.medium.2m8c.local.20241202T130926', 'output_predictions_file': '/home/devcontainers/automlbenchmark/stable/autosklearn.medium.2m8c.local.20241202T130926/predictions/Loan_Type/9/predictions.csv', 'tag': None, 'command': 'runbenchmark.py autosklearn medium 2m8c -m local -p 1 -u ~/dev/null -o ./stable -Xmax_parallel_jobs=12 -Xaws.use_docker=False -Xaws.query_frequency_seconds=300', 'git_info': {'repo': 'https://github.com/openml/automlbenchmark.git', 'branch': 'master', 'commit': '500480923d8f85455958f3c5d620a98cbffb771f', 'tags': [], 'status': ['## master...origin/master [ahead 3, behind 5]', ' M resources/benchmarks/medium.yaml', ' M runstable.sh']}, 'measure_inference_time': False, 'ext': {}, 'quantile_levels': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9], 'type': 'multiclass', 'output_metadata_file': '/home/devcontainers/automlbenchmark/stable/autosklearn.medium.2m8c.local.20241202T130926/predictions/Loan_Type/9/metadata.json'})
PyOpenML cannot handle string when returning numpy arrays. Use dataset_format="dataframe".
Traceback (most recent call last):
File "/home/devcontainers/automlbenchmark/venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 629, in _convert_array_format
return np.asarray(data, dtype=np.float32)
File "/home/devcontainers/automlbenchmark/venv/lib/python3.9/site-packages/pandas/core/generic.py", line 2070, in array
return np.asarray(self._values, dtype=dtype)
ValueError: could not convert string to float: 'IBRDB0050'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/devcontainers/automlbenchmark/amlb/benchmark.py", line 605, in run
meta_result = self.benchmark.framework_module.run(self._dataset, task_config)
File "/home/devcontainers/automlbenchmark/frameworks/autosklearn/init.py", line 17, in run
X_enc=dataset.train.X_enc,
File "/home/devcontainers/automlbenchmark/amlb/utils/cache.py", line 77, in decorator
return cache(self, prop_name, prop_fn)
File "/home/devcontainers/automlbenchmark/amlb/utils/cache.py", line 35, in cache
value = fn(self)
File "/home/devcontainers/automlbenchmark/amlb/utils/process.py", line 744, in profiler
return fn(*args, **kwargs)
File "/home/devcontainers/automlbenchmark/amlb/data.py", line 159, in X_enc
return self.data_enc[:, predictors_ind]
File "/home/devcontainers/automlbenchmark/amlb/utils/cache.py", line 77, in decorator
return cache(self, prop_name, prop_fn)
File "/home/devcontainers/automlbenchmark/amlb/utils/cache.py", line 35, in cache
value = fn(self)
File "/home/devcontainers/automlbenchmark/amlb/utils/process.py", line 744, in profiler
return fn(*args, **kwargs)
File "/home/devcontainers/automlbenchmark/amlb/datasets/openml.py", line 275, in data_enc
return self._get_data('array')
File "/home/devcontainers/automlbenchmark/amlb/datasets/openml.py", line 279, in _get_data
self.dataset._load_data(fmt)
File "/home/devcontainers/automlbenchmark/amlb/datasets/openml.py", line 236, in _load_data
train, test = splitter.split()
File "/home/devcontainers/automlbenchmark/amlb/utils/process.py", line 744, in profiler
return fn(*args, **kwargs)
File "/home/devcontainers/automlbenchmark/amlb/datasets/openml.py", line 309, in split
X = self.ds._load_full_data('array')
File "/home/devcontainers/automlbenchmark/amlb/datasets/openml.py", line 241, in load_full_data
X, * = self._oml_dataset.get_data(dataset_format=fmt)
File "/home/devcontainers/automlbenchmark/venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 732, in get_data
data = self._convert_array_format(data, dataset_format, attribute_names)
File "/home/devcontainers/automlbenchmark/venv/lib/python3.9/site-packages/openml/datasets/dataset.py", line 631, in _convert_array_format
raise PyOpenMLError(`
The text was updated successfully, but these errors were encountered: