Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve resampling options #19

Open
cwasicki opened this issue Feb 23, 2024 · 1 comment
Open

Improve resampling options #19

cwasicki opened this issue Feb 23, 2024 · 1 comment
Labels
part:❓ We need to figure out which part is affected priority:❓ We need to figure out how soon this should be addressed type:enhancement New feature or enhancement visitble to users
Milestone

Comments

@cwasicki
Copy link
Contributor

cwasicki commented Feb 23, 2024

What's needed?

Users could get more control about resampling:

  • aggregation: how to aggregate the data points in the interval, e.g. mean, sum, min/max, interpolation (upsampling)
  • closed: left or right closed interval, i.e. ts1 <= ts < ts2 or ts1 < ts <= ts2
  • label: which timestamp is assigned to the resampled interval. Possible options:
    • Fixed-interval according to resampling bins, e.g. start/end of resampling bin (corresponds to oldest/newest possible timestamp), or the center of the resampling bin,
    • Derived from the data, e.g. the oldest/newest/average of the timestamps that were aggregated in each resampling bin.
  • resolution (update): the resolution parameter currently does not support resampling periods smaller than 1s.

The label of the first version of the API defaults to fixed-interval using the start of the resampling bin as timestamp.

Proposed solution

Support corresponding parameters in ResamplingOptions.

Use cases

Different aggregations make sense if metrics like energy (sum) or peak values (min/max) are of interest.

Different closed options could be helpful if data is compared with external data that could use another interval definition (e.g. DSO, clients).

label is required if the closed option is changed to avoid weird timestamps.

Resolutions below 1s could be interesting if faster reaction is needed or very short-term forecasts. Since we plan to make resampling mandatory when aggregating components, the shortest resolution would be 1s for component aggregations.

Alternatives and workarounds

No response

Additional context

Related to:

Example for resampling options in pandas: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html

@cwasicki cwasicki added part:❓ We need to figure out which part is affected priority:❓ We need to figure out how soon this should be addressed type:enhancement New feature or enhancement visitble to users labels Feb 23, 2024
@cwasicki cwasicki added this to the post-v1.0.0 milestone Oct 11, 2024
@cwasicki
Copy link
Contributor Author

Moving post v1.0:

aggregation: how to aggregate the data points in the interval, e.g. mean, sum, min/max, interpolation (upsampling)

With the current ETL the aggregation is pre-determined, changing the aggregation method would only work on raw data and can be deferred to the client or the user.

closed: left or right closed interval, i.e. ts1 <= ts < ts2 or ts1 < ts <= ts2

Don't think that closed has high practical implications.

label: which timestamp is assigned to the resampled interval. Possible options:

That's easy to fix by the user or on client level.

resolution (update): the resolution parameter currently does not support resampling periods smaller than 1s.

At least for our current data streams with a handful of samples per second this is of minor importance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
part:❓ We need to figure out which part is affected priority:❓ We need to figure out how soon this should be addressed type:enhancement New feature or enhancement visitble to users
Projects
None yet
Development

No branches or pull requests

1 participant