You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Instead of having individual methods to query the DType, categorical description, null description and metadata (which I suspect might be replicated at the DataFrame level?), how about adding a first-class abstraction to tie them together? For example:
classColumnSchema(TypedDict):
# the underlying physical representationdtype: DType# if the column is categorical, describes how to interpret the contentscategorical_encoding: Optional[CategoricalDescription]
# if the column supports null values, describes how they are representednull_encoding: Optional[Tuple[ColumnNullType, Any]]
# arbitrary metadata attached to the column, possibly emptymetadata: Dict[str, Any]
classColumn(ABC):
...
@property@abstractmethoddefschema(self) ->ColumnSchema: ...
(IMHO, "encoding" sounds more precise than "description")
I'm also not sure why the spec uses a mix of Tuples and TypedDicts. Is it an attempt at optimizing Python object footprint?
The text was updated successfully, but these errors were encountered:
pitrou
changed the title
Add a ColumnSchema abstraction?
[protocol] Add a ColumnSchema abstraction?
Oct 3, 2023
Instead of having individual methods to query the DType, categorical description, null description and metadata (which I suspect might be replicated at the DataFrame level?), how about adding a first-class abstraction to tie them together? For example:
(IMHO, "encoding" sounds more precise than "description")
I'm also not sure why the spec uses a mix of Tuples and TypedDicts. Is it an attempt at optimizing Python object footprint?
The text was updated successfully, but these errors were encountered: