Description
What's needed?
Users could get more control about resampling:
- aggregation: how to aggregate the data points in the interval, e.g. mean, sum, min/max, interpolation (upsampling)
- closed: left or right closed interval, i.e.
ts1 <= ts < ts2
orts1 < ts <= ts2
- label: which timestamp is assigned to the resampled interval. Possible options:
- Fixed-interval according to resampling bins, e.g. start/end of resampling bin (corresponds to oldest/newest possible timestamp), or the center of the resampling bin,
- Derived from the data, e.g. the oldest/newest/average of the timestamps that were aggregated in each resampling bin.
- resolution (update): the resolution parameter currently does not support resampling periods smaller than 1s.
The label of the first version of the API defaults to fixed-interval using the start of the resampling bin as timestamp.
Proposed solution
Support corresponding parameters in ResamplingOptions
.
Use cases
Different aggregations make sense if metrics like energy (sum) or peak values (min/max) are of interest.
Different closed
options could be helpful if data is compared with external data that could use another interval definition (e.g. DSO, clients).
label
is required if the closed
option is changed to avoid weird timestamps.
Resolutions below 1s could be interesting if faster reaction is needed or very short-term forecasts. Since we plan to make resampling mandatory when aggregating components, the shortest resolution would be 1s for component aggregations.
Alternatives and workarounds
No response
Additional context
Related to:
- https://github.com/frequenz-floss/frequenz-api-reporting/pull/6/files#r1406818990
- https://github.com/frequenz-floss/frequenz-api-reporting/pull/6/files#r1406683395
- https://github.com/frequenz-floss/frequenz-api-reporting/pull/6/files#r1406864775
Example for resampling options in pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html