-
Notifications
You must be signed in to change notification settings - Fork 8
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Summary
Task is to refactor the inputs and outputs directory and move into a single one called io
.
Basic Example
- all goes inside
io
- single filesjson.py
,parquet.py
and so on. - each file have classes for Reader and Writer -
JsonReader
/JsonWriter
- all respect the parent class signature and have very simple usage - decisions abstracted - ex: all readers have
read()
and all writers havewrite()
everything else is a private or a utility method. all extra parameters to tune then are keyword arguments. - imports will look very simple
from application_sdk.io.json import JsonWriter
and all usages look likejson_reader.read()
orjson_writer.write()
close()
will perform cleanup, upload to objectstore and return statistics
Motivation
- we have two directories inputs and outputs - each files like
json.py
,parquet.py
andiceberg.py
- inputs have methods like
get_dataframe
,get_daft_dataframe
- some haveget_batched_dataframe
,get_batched_daft_dataframe
- none of these really respect the method signatures of parent/base class
- imports look like
from application_sdk.outputs.json import JsonOutput
and usage look likejson_output.write_batched_daft_dataframe()
it's really confusing what to use when and then to choose daft/pandas? why?
Drawbacks
open
Unresolved Questions
open
Reference Issues / PRs
A draft PR can be found here to work off of - #715
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request