-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
Describe the bug
from_polars
dataset cannot cast column to Image/Audio/Video, while it works on from_pandas
and from_dict
Steps to reproduce the bug
import datasets
import pandas as pd
import polars as pl
image_path = "./sample.png"
# polars
df = pl.DataFrame({"image_path": [image_path]})
dataset = datasets.Dataset.from_polars(df)
dataset = dataset.cast_column("image_path", datasets.Image())
# # raises Error
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from large_string to struct using function cast_struct
# pandas
df = pd.DataFrame({"image_path": [image_path]})
dataset = datasets.Dataset.from_pandas(df)
dataset = dataset.cast_column("image_path", datasets.Image())
# # pass
{'image_path': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=338x277 at 0x7FBA719D4050>}
# dict
dataset = datasets.Dataset.from_dict({"image_path": [image_path]})
dataset = dataset.cast_column("image_path", datasets.Image())
# # pass
{'image_path': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=338x277 at 0x7FBA719D4050>}
Expected behavior
from_polars
case shouldn't raise error and have the same outputs as from_pandas
and from_dict
Environment info
# Name Version Build Channel
datasets 4.0.0 pypi_0 pypi
pandas 2.3.1 pypi_0 pypi
polars 1.32.3 pypi_0 pypi
samuelstevens
Metadata
Metadata
Assignees
Labels
No labels