-
Notifications
You must be signed in to change notification settings - Fork 8
refactor: input and output abstractions (WIP) #715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
description="Number of errors while writing to files", | ||
) | ||
logger.error(f"Error writing pandas dataframe to files: {str(e)}") | ||
raise |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Missing Attributes in Output Class
The write_dataframe
method in the Output
base class, now a concrete implementation, attempts to use attributes like chunk_part
and metrics
, and calls methods such as path_gen
, _flush_buffer
, and _upload_file
. These members are not defined in the Output
base class, which causes AttributeError
s when the method executes.
# Get the generated file path and rename to final location | ||
result_dict = result.to_pydict() | ||
generated_file = result_dict["path"][0] | ||
os.rename(generated_file, consolidated_file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Consolidation Fails on Empty Daft Output
The consolidation logic in _consolidate_current_folder
assumes daft_df.write_parquet
always returns a dictionary with a "path" key containing a non-empty list. If Daft's output structure changes or is empty, accessing result_dict["path"][0]
could cause a KeyError
or IndexError
.
Changelog
Additional context (e.g. screenshots, logs, links)
Checklist
Copyleft License Compliance