Skip to content

polars dataset cannot cast column to Image/Audio/Video #7765

@ain-soph

Description

@ain-soph

Describe the bug

from_polars dataset cannot cast column to Image/Audio/Video, while it works on from_pandas and from_dict

Steps to reproduce the bug

import datasets
import pandas as pd
import polars as pl

image_path = "./sample.png"

# polars
df = pl.DataFrame({"image_path": [image_path]})
dataset = datasets.Dataset.from_polars(df)
dataset = dataset.cast_column("image_path", datasets.Image())

# # raises Error
pyarrow.lib.ArrowNotImplementedError: Unsupported cast from large_string to struct using function cast_struct


# pandas 
df = pd.DataFrame({"image_path": [image_path]})
dataset = datasets.Dataset.from_pandas(df)
dataset = dataset.cast_column("image_path", datasets.Image())

# # pass
{'image_path': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=338x277 at 0x7FBA719D4050>}


# dict
dataset = datasets.Dataset.from_dict({"image_path": [image_path]})
dataset = dataset.cast_column("image_path", datasets.Image())

# # pass
{'image_path': <PIL.PngImagePlugin.PngImageFile image mode=RGB size=338x277 at 0x7FBA719D4050>}

Expected behavior

from_polars case shouldn't raise error and have the same outputs as from_pandas and from_dict

Environment info

# Name                    Version                   Build  Channel
datasets                  4.0.0                    pypi_0    pypi
pandas                    2.3.1                    pypi_0    pypi
polars                    1.32.3                   pypi_0    pypi

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions