Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline.dataset() fails if ibis is not installed #2178

Open
trymzet opened this issue Dec 23, 2024 · 1 comment
Open

Pipeline.dataset() fails if ibis is not installed #2178

trymzet opened this issue Dec 23, 2024 · 1 comment

Comments

@trymzet
Copy link
Contributor

trymzet commented Dec 23, 2024

dlt version

1.5.0

Describe the problem

According to the docstring, the method should fall back to a dbapi interface if the default value of dataset_type ("auto") is used. However, further operations on the dataset, eg. dataset.<my_table>.df(), dataset.table("my_table").df(), or dataset.table("my_table").arrow() fail with an ImportError.

Probably why the example in the first notebook in the dbt fundamentals course explicitly sets the value of this param to "default"?

Expected behavior

No response

Steps to reproduce

# !pip install "dlt[duckdb]"

import dlt

data = [
    {"id": "1", "name": "bulbasaur", "size": {"weight": 6.9, "height": 0.7}},
    {"id": "4", "name": "charmander", "size": {"weight": 8.5, "height": 0.6}},
    {"id": "25", "name": "pikachu", "size": {"weight": 6, "height": 0.4}},
]
pipeline = dlt.pipeline(
    pipeline_name="quick_start",
    destination="duckdb",
    dataset_name="mydata",
)
pipeline.run(data, table_name="pokemon")

dataset = pipeline.dataset()
df = dataset.pokemon.df()

Operating system

Linux

Runtime environment

Google Colab

Python version

3.11

dlt data source

No response

dlt destination

No response

Other deployment details

No response

Additional information

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[/usr/local/lib/python3.10/dist-packages/ibis/__init__.py](https://localhost:8080/#) in load_backend(name)
     81     try:
---> 82         module = entry_point.load()
     83     except ImportError as exc:

16 frames
[/usr/local/lib/python3.10/dist-packages/importlib_metadata/__init__.py](https://localhost:8080/#) in load(self)
    188         match = cast(Match, self.pattern.match(self.value))
--> 189         module = import_module(match.group('module'))
    190         attrs = filter(None, (match.group('attr') or '').split('.'))

[/usr/lib/python3.10/importlib/__init__.py](https://localhost:8080/#) in import_module(name, package)
    125             level += 1
--> 126     return _bootstrap._gcd_import(name[level:], package, level)
    127 

/usr/lib/python3.10/importlib/_bootstrap.py in _gcd_import(name, package, level)

/usr/lib/python3.10/importlib/_bootstrap.py in _find_and_load(name, import_)

/usr/lib/python3.10/importlib/_bootstrap.py in _find_and_load_unlocked(name, import_)

/usr/lib/python3.10/importlib/_bootstrap.py in _load_unlocked(spec)

/usr/lib/python3.10/importlib/_bootstrap_external.py in exec_module(self, module)

/usr/lib/python3.10/importlib/_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

[/usr/local/lib/python3.10/dist-packages/ibis/backends/duckdb/__init__.py](https://localhost:8080/#) in <module>
     15 import pyarrow as pa
---> 16 import pyarrow_hotfix  # noqa: F401
     17 import sqlglot as sg

ModuleNotFoundError: No module named 'pyarrow_hotfix'

The above exception was the direct cause of the following exception:

ImportError                               Traceback (most recent call last)
[<ipython-input-33-bbd954019e52>](https://localhost:8080/#) in <cell line: 2>()
      1 dataset = pipeline.dataset()
----> 2 df = dataset.pokemon.df()
      3 n_cols = len(df.columns)
      4 # This throws a random error due to some bug in dlt. Looks like the random
      5 # `dataset_type` param of the `dataset()` method must be used for things to work.

[/usr/local/lib/python3.10/dist-packages/dlt/destinations/dataset/relation.py](https://localhost:8080/#) in _wrap(*args, **kwargs)
     83 
     84         def _wrap(*args: Any, **kwargs: Any) -> Any:
---> 85             with self.cursor() as cursor:
     86                 return getattr(cursor, func_name)(*args, **kwargs)
     87 

[/usr/lib/python3.10/contextlib.py](https://localhost:8080/#) in __enter__(self)
    133         del self.args, self.kwds, self.func
    134         try:
--> 135             return next(self.gen)
    136         except StopIteration:
    137             raise RuntimeError("generator didn't yield") from None

[/usr/local/lib/python3.10/dist-packages/dlt/destinations/dataset/relation.py](https://localhost:8080/#) in cursor(self)
     65             if hasattr(self.sql_client, "_conn") and hasattr(self.sql_client._conn, "autocommit"):
     66                 self.sql_client._conn.autocommit = False
---> 67             with client.execute_query(self.query) as cursor:
     68                 if columns_schema := self.columns_schema:
     69                     cursor.columns_schema = columns_schema

[/usr/local/lib/python3.10/dist-packages/dlt/destinations/dataset/ibis_relation.py](https://localhost:8080/#) in query(self)
     72         # render sql directly if possible
     73         if target_dialect not in TRANSPILE_VIA_MAP:
---> 74             return ibis.to_sql(self._ibis_object, dialect=target_dialect)
     75 
     76         # here we need to transpile first

[/usr/local/lib/python3.10/dist-packages/ibis/expr/sql.py](https://localhost:8080/#) in to_sql(expr, dialect, pretty, **kwargs)
    377     else:
    378         try:
--> 379             backend = getattr(ibis, dialect)
    380         except AttributeError:
    381             raise ValueError(f"Unknown dialect {dialect}")

[/usr/local/lib/python3.10/dist-packages/ibis/__init__.py](https://localhost:8080/#) in __getattr__(name)
    141         return null()  # noqa: F405
    142     else:
--> 143         return load_backend(name)

[/usr/local/lib/python3.10/dist-packages/ibis/__init__.py](https://localhost:8080/#) in load_backend(name)
     82         module = entry_point.load()
     83     except ImportError as exc:
---> 84         raise ImportError(
     85             f"Failed to import the {name} backend due to missing dependencies.\n\n"
     86             f"You can pip or conda install the {name} backend as follows:\n\n"

ImportError: Failed to import the duckdb backend due to missing dependencies.

You can pip or conda install the duckdb backend as follows:

  python -m pip install -U "ibis-framework[duckdb]"  # pip install
  conda install -c conda-forge ibis-duckdb           # or conda install

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------
@sh-rp
Copy link
Collaborator

sh-rp commented Jan 2, 2025

@trymzet thanks for reporting this. The problem here is, that ibis infact is installed by default in the notebook / collab but it using an older version that is needs the duckdb dependencies to work. If you run this locally it should work without ibis without problems. So you can update ibis in your collab to v10 or later or try to uninstall it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants