Skip to content

Conversation

@flora-hofmann-frequenz
Copy link
Collaborator

An effort to take out all data processing functions from the Reporting NB in Deepnote.

Example NB

Copilot AI review requested due to automatic review settings June 12, 2025 14:50
@flora-hofmann-frequenz flora-hofmann-frequenz requested a review from a team as a code owner June 12, 2025 14:50
@github-actions github-actions bot added the part:docs Affects the documentation label Jun 12, 2025
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR extracts and consolidates data processing routines from the Reporting notebook into a dedicated data_processing module.

  • Adds a new data_processing.py with timezone conversion, grid/PV/battery metrics, and analysis functions.
  • Updates RELEASE_NOTES.md to announce the new module.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
src/frequenz/lib/notebooks/reporting/data_processing.py New module containing extracted data processing functions for reporting
RELEASE_NOTES.md Added entry for data_processing module
Comments suppressed due to low confidence (3)

src/frequenz/lib/notebooks/reporting/data_processing.py:45

  • [nitpick] The parameter name mcfg is ambiguous; consider renaming it to something more descriptive like config or providing a proper type alias for clarity.
def rename_component_columns(df: pd.DataFrame, component_types: List[str], mcfg: Any) -> pd.DataFrame:

src/frequenz/lib/notebooks/reporting/data_processing.py:317

  • The bat_filter parameter is a string but handled like a list in code; consider unifying filter parameter types (e.g., List[str]) for consistency with the PV analysis API.
def create_battery_analyse_df(master_df: pd.DataFrame, bat_filter: str) -> pd.DataFrame:

src/frequenz/lib/notebooks/reporting/data_processing.py:4

  • [nitpick] Expand the module docstring to describe expected input DataFrame schemas (required column names and types) and provide a high-level overview of the functions included.
"""Data processing functions for the reporting module."""

@cwasicki
Copy link
Contributor

An effort to take out all data processing functions from the Reporting NB in Deepnote.

Are these all copied as-is from the reporting notebook?

@cyiallou
Copy link
Contributor

Are these all copied as-is from the reporting notebook?

If so, do you plan to clean them up in this PR, or in a follow-up? Just asking because I noticed a few spots that could probably use a bit of tidying.

@flora-hofmann-frequenz
Copy link
Collaborator Author

If so, do you plan to clean them up in this PR, or in a follow-up? Just asking because I noticed a few spots that could probably use a bit of tidying.

Happy to clean either in this PR or in the next one. What would you recommend?

They are not exactly as in the reporting notebook at most blocks in the notebook are not even functions yet and have been add by many different programmers (Malte, Noah, me...). As a first step I turned all of them into functions and made sure the notebook runs smoothly (with testing of different use cases) using them as they are here now.

Copy link
Contributor

@cyiallou cyiallou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the implementation. I have left a few comments mainly around consistency, maintainability, and minor architectural points. Nothing blocking, but worth addressing where possible I think. Some general comments:

  • DataFrame mutation: Some functions modify the input DataFrame in place. Consider calling .copy() early or documenting the mutation clearly if this is intended.
  • Hardcoded column names: Strings like "PV Produktion" and "Netzanschluss" are repeated throughout. Suggest extracting them as constants to reduce errors and improve maintainability. There are more hardcoded strings throughout the code.
  • Timezone inconsistency: convert_timezone() uses "Europe/Berlin", while compute_peak_usage() uses "CET". Consider unifying this with a shared constant.
  • Docstrings are missing Args and Returns. Adding them would improve clarity.
  • Consider whether adding tests for some functions makes sense.

@flora-hofmann-frequenz
Copy link
Collaborator Author

Addressed in PR #106

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

part:docs Affects the documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants