-
Notifications
You must be signed in to change notification settings - Fork 8
feat: add a column mapper class #164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v0.x.x
Are you sure you want to change the base?
feat: add a column mapper class #164
Conversation
4afd9b8
to
de48ef9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just from this PR it's not clear what this is about because it is missing PR and commit descriptions and the doc is not sufficient. This is missing a motivation why this class is needed (in the future) and how it is intended to be used.
I am also not sure about a couple of name matchings. This stresses the point that we need well-defined metric definitions. In fact I think we should add the module with the definitions of all of these metrics used here either before this PR or add them here.
"""Column mapping utilities for energy reporting. | ||
|
||
Provides the `ColumnMapper` dataclass to manage renaming between raw, | ||
canonical, and localized display column names. Supports loading schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are raw, canonical and display names? Where are these used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update this in the documentation.
Raw - directly from reporting api
Canonical - used in our codebase
Display - To be displayed in the notebooks
When I added EN/DE support for the solar notebook, I faced a similar challenge. At the time I went for a simple TranslationManager class with an internal dictionary of text + translations (link). It’s admittedly not the most elegant solution, but it avoids introducing an extra dependency in this repo (which is itself consumed elsewhere). That trade-off was one of the reasons I kept it simple. I see the YAML approach here is more structured (which is definitely a plus for maintainability). Just wanted to flag the existing |
The translator.py script looks well-structured and offers a wide range of functionalities. My current solution is more narrowly focused, handling only the translation of column display names into the required language. It might be worthwhile to combine both approaches and define a comprehensive configuration within a .yaml file. Do you think this would be a feasible path forward? |
de48ef9
to
8e0cd67
Compare
Updates the requirements on [kaleido](https://github.com/plotly/kaleido) to permit the latest version. - [Release notes](https://github.com/plotly/kaleido/releases) - [Commits](plotly/Kaleido@v0.2.1...v1.1.0) --- updated-dependencies: - dependency-name: kaleido dependency-version: 1.1.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]>
e659ea3
to
dae7d8c
Compare
df_with_pv_flows["pv_self"] = ( | ||
df_with_pv_flows["pv_prod"] - df_with_pv_flows["pv_excess"] | ||
).clip(lower=0) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if calculating pv_self as shown here the correct way of doing this. I have picked this directly from the previous reporting NBs.
dae7d8c
to
8cbfef0
Compare
Signed-off-by: Mohammad Tayyab <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a column mapping utility for energy reporting notebooks to insulate them from upstream schema changes. The implementation provides locale-aware column name translations and timezone handling through a YAML configuration file.
Key changes:
- New
ColumnMapper
class that translates between raw API headers, canonical names, and localized display labels - YAML schema configuration file defining column mappings, descriptions, and locale-specific labels
- Helper functions for PV energy flow calculations
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
src/frequenz/lib/notebooks/reporting/utils/column_mapper.py |
Core ColumnMapper class with YAML loading and locale-aware column translation |
src/frequenz/lib/notebooks/reporting/schema_mapping.yaml |
Schema configuration defining column mappings and display labels |
src/frequenz/lib/notebooks/reporting/utils/helpers.py |
PV energy flow calculation utilities |
src/frequenz/lib/notebooks/reporting/utils/__init__.py |
Package initialization file |
pyproject.toml |
Added PyYAML dependencies and updated kaleido version |
RELEASE_NOTES.md |
Documentation of the new ColumnMapper feature |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Signed-off-by: Mohammad Tayyab <[email protected]>
Signed-off-by: Mohammad Tayyab <[email protected]>
8cbfef0
to
189099b
Compare
Args: | ||
df: Input DataFrame. If present, uses columns ``pv_neg``, | ||
``consumption``, and ``battery_pos``. Missing columns are | ||
treated as zeros. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would raise if any of the expected columns is missing, otherwise it's easy to create typo bugs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I picked it from the notebooks. I think previously sometimes consumption column would be missing so this logic was implemented but now we should raise if any column name is missing.
But one question would be whether pv_columns and battery columns will be present in the data even if the component is missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I think I need to add the logic to check whether the component type is present in the data. Then forward with the checks/calculations otherwise pass.
Newly created/updated columns: | ||
- ``pv_prod``: PV production as a positive series (negated/clipped from | ||
``pv_neg``). | ||
- ``pv_excess``: Excess PV after subtracting household consumption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
positive or negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, currently the logic is to multiply pv_neg * -1 to get the pv prod in positive float.
import pandas as pd | ||
|
||
|
||
def _add_pv_energy_flows(df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is problematic if there is more than one source for production, e.g. from CHP. Options to deal with this could be to
- Make this to work not with PV but "production" in general.
- Enforce that PV is the only source and otherwise raise. Then we cannot use it on certain (already existing) microgrids though, so this can only be a temporary solution.
- Take into account all sources. This requires some accounting logic though, which can be case-dependent.
No description provided.