[protocol] Add a ColumnSchema abstraction? #253

pitrou · 2023-09-05T10:12:32Z

Instead of having individual methods to query the DType, categorical description, null description and metadata (which I suspect might be replicated at the DataFrame level?), how about adding a first-class abstraction to tie them together? For example:

class ColumnSchema(TypedDict):
    # the underlying physical representation
    dtype: DType
    # if the column is categorical, describes how to interpret the contents
    categorical_encoding: Optional[CategoricalDescription]
    # if the column supports null values, describes how they are represented
    null_encoding: Optional[Tuple[ColumnNullType, Any]]
    # arbitrary metadata attached to the column, possibly empty
    metadata: Dict[str, Any]

class Column(ABC):
    ...
    @property
    @abstractmethod
    def schema(self) -> ColumnSchema: ...

(IMHO, "encoding" sounds more precise than "description")

I'm also not sure why the spec uses a mix of Tuples and TypedDicts. Is it an attempt at optimizing Python object footprint?

pitrou changed the title ~~Add a ColumnSchema abstraction?~~ [protocol] Add a ColumnSchema abstraction? Oct 3, 2023

jorisvandenbossche added the interchange-protocol label Oct 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[protocol] Add a ColumnSchema abstraction? #253

[protocol] Add a ColumnSchema abstraction? #253

pitrou commented Sep 5, 2023

[protocol] Add a ColumnSchema abstraction? #253

[protocol] Add a ColumnSchema abstraction? #253

Comments

pitrou commented Sep 5, 2023