-
Notifications
You must be signed in to change notification settings - Fork 8
Add data_processing module to reporting #102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data_processing module to reporting #102
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR extracts and consolidates data processing routines from the Reporting notebook into a dedicated data_processing module.
- Adds a new
data_processing.pywith timezone conversion, grid/PV/battery metrics, and analysis functions. - Updates
RELEASE_NOTES.mdto announce the new module.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/frequenz/lib/notebooks/reporting/data_processing.py | New module containing extracted data processing functions for reporting |
| RELEASE_NOTES.md | Added entry for data_processing module |
Comments suppressed due to low confidence (3)
src/frequenz/lib/notebooks/reporting/data_processing.py:45
- [nitpick] The parameter name
mcfgis ambiguous; consider renaming it to something more descriptive likeconfigor providing a proper type alias for clarity.
def rename_component_columns(df: pd.DataFrame, component_types: List[str], mcfg: Any) -> pd.DataFrame:
src/frequenz/lib/notebooks/reporting/data_processing.py:317
- The
bat_filterparameter is a string but handled like a list in code; consider unifying filter parameter types (e.g.,List[str]) for consistency with the PV analysis API.
def create_battery_analyse_df(master_df: pd.DataFrame, bat_filter: str) -> pd.DataFrame:
src/frequenz/lib/notebooks/reporting/data_processing.py:4
- [nitpick] Expand the module docstring to describe expected input DataFrame schemas (required column names and types) and provide a high-level overview of the functions included.
"""Data processing functions for the reporting module."""
Are these all copied as-is from the reporting notebook? |
If so, do you plan to clean them up in this PR, or in a follow-up? Just asking because I noticed a few spots that could probably use a bit of tidying. |
Happy to clean either in this PR or in the next one. What would you recommend? They are not exactly as in the reporting notebook at most blocks in the notebook are not even functions yet and have been add by many different programmers (Malte, Noah, me...). As a first step I turned all of them into functions and made sure the notebook runs smoothly (with testing of different use cases) using them as they are here now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the implementation. I have left a few comments mainly around consistency, maintainability, and minor architectural points. Nothing blocking, but worth addressing where possible I think. Some general comments:
- DataFrame mutation: Some functions modify the input DataFrame in place. Consider calling .copy() early or documenting the mutation clearly if this is intended.
- Hardcoded column names: Strings like "PV Produktion" and "Netzanschluss" are repeated throughout. Suggest extracting them as constants to reduce errors and improve maintainability. There are more hardcoded strings throughout the code.
- Timezone inconsistency: convert_timezone() uses "Europe/Berlin", while compute_peak_usage() uses "CET". Consider unifying this with a shared constant.
- Docstrings are missing Args and Returns. Adding them would improve clarity.
- Consider whether adding tests for some functions makes sense.
Signed-off-by: Flora <[email protected]>
efcfa64 to
9bbc8fd
Compare
Signed-off-by: Flora <[email protected]>
|
Addressed in PR #106 |
An effort to take out all data processing functions from the Reporting NB in Deepnote.
Example NB