Skip to content

Conversation

ExplorerRay
Copy link
Contributor

@ExplorerRay ExplorerRay commented Aug 28, 2025

This proposal aims to broaden the support of platform power and energy metrics with ACPI for Kepler.
#2079

@github-actions github-actions bot added the docs Documentation changes label Aug 28, 2025
Add ACPI option which is an alternative of Redfish to Kepler.
ACPI can collect platform power metrics from reading sysfs. The platform collector is able to reuse the one that Redfish uses if merged.

```text
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: mermaid (like in MSR proposal) is a better way to draw this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fix this, but I think I need to open another PR after all things are discussed and checked. Otherwise, the merge commit for resolving conflict will also be included.

**CLI flags**

```
--platform.acpi.enabled=true # Enable ACPI monitoring
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that for the actual feature implementation, lets introduce this as an experimental feature, get feedback and then promote it as a supported feature.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, what should I do for an experimental feature? Change --platform.acpi.enabled=true into --experimental.platform.acpi.enabled=true?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats right .. May be for proposals, we should add along with status: Implemented, a field for maturity: Experimental|Stable .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There doesn’t seem to be any document explaining how to handle experimental features. For example, how to configure them or when to promote them to stable. I think a guide is needed. Otherwise, it may be hard for people to decide which feature should be an experimental one.

```prometheus
# Platform power metrics with new source
kepler_platform_watts{source="acpi",node_name="worker-1"} 450.5
kepler_platform_joules_total{source="acpi",node_name="worker-1"} 123456.789
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
kepler_platform_joules_total{source="acpi",node_name="worker-1"} 123456.789

As discussed, we can't support energy joules total just like redfish from instantaneous power. In this case the 'average_power' is only with in the average_interval (which seems like a rolling window). I don't think we should read the power over the average_interval and integrate it.

Hence, lets not support energy computation from power since it is incorrect (based on my reading of the documentation: https://www.kernel.org/doc/html/latest/hwmon/acpi_power_meter.html

Incidentally I found https://github.com/torvalds/linux/blob/master/Documentation/hwmon/sysfs-interface.rst#energy but ACPI doesn't seem to implement it. Lets support this feature if and when kernel support it as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I am going to implement is reading the power[1-*]_average every power[1-*]_average_interval so that every reading will occur in the rolling window. I don't think the result will be incorrect. But the overhead might be higher depends on different intervals.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are we going to validate that the result is valid? IMHO, we should avoid providing a feature that isn't supported by the driver and can't be validated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I got it.

### Technical risks

- **Several measurements under one power meter**: Aggregate `power*_average` to the power meter reading value.
- **Several ACPI power meters detected**: Implement aggregation logic for to combine readings from multiple ACPI power meters (`ACPI000D:xx`).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to take another look at this. We need to ensure that there is no hierarchy/topology in these directories otherwise we run the risk of double counting .. E.g.

E.g. what if ..

  • Multiple ACPI000D:XX devices detected
  • If meter A measures total platform power
  • And meter B measures a subset (CPU power)
  • Then A + B would be the wrong information to provide - right?

We need ...

  1. Real world data /evidence of Multiple Meters .. and the structure

  2. Do you know if we can make use of https://github.com/torvalds/linux/blob/master/drivers/hwmon/acpi_power_meter.c#L254 ?

Copy link
Contributor Author

@ExplorerRay ExplorerRay Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out.

If there are multiple ACPI000D devices, we can use measures/ directory to check which device is measured by this power meter. ref

About multiple measurements power[1-*]_xxx part, I haven't found a reference that telling how to determine how power1, power2 are generated and what they measure.

The host I can access currently only has ACPI000D:00 and power1_xxx in it, so I can only refer to docs or code to know how multiple power meters or measurements work.


- Testdata Structure: Mimic real sysfs structure under `internal/device/platform/acpi/testdata/`
- Fake multiple power meters (`ACPI000D:xx`) or multiple measurements (`power*_average`) for testing
- e.g., `testdata/sys/bus/acpi/drivers/power_meter/ACPI000D:00/power1_average`, with sample values for power readings (e.g., 450500000 microwatts for 450.5 watts).
Copy link
Collaborator

@sthaha sthaha Aug 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can there be a different unit as well? Eg. milliwatts instead of micro? We have had cases in the past (with old kepler) where the power reported was abnormally high ..
See: #1774 -> https://github.com/sustainable-computing-io/kepler-metal-ci/blob/main/docs/train-validate-e2e/2024-09-09_22-41-38/AbsPower/BPFOnly/ExponentialRegressionTrainer/report-v0.7.11-212-gac5ee8b8.md#platform---idle

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sthaha sthaha requested a review from vimalk78 August 28, 2025 04:26

### Functional Requirements

- Use [golang sysfs package](https://github.com/prometheus/procfs) which is used by RAPL to read ACPI power_meter so that no additional dependencies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what path(s) will be read for ACPI power by procfs? is reading /sys/class/hwmon supported in procfs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally thought reading /sys/bus/acpi/drivers/power_meter/ACPI000D:XX/power[1-*]_average.
But I am not sure if hwmon path and this path which is much common in different OS, this path should be the one that most OS support so that less issues will occur when OS changes. I am not sure which one is better.

I found that golang sysfs module doesn't support ACPI reading both the path I mentioned and hwmon path. I will change using sysfs module into reading file directly.

Thanks.


### Functional Requirements

- Use [golang sysfs package](https://github.com/prometheus/procfs) which is used by RAPL to read ACPI power_meter so that no additional dependencies.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what permissions are needed to read acpi power?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Ubuntu 24.04 host I accessed before, the power[1-*]_average are readable for all users. So we only need to mount into container, and this has already done because of RAPL (mount host sysfs). There are no additional capabilities needed for reading.

### Aggregate power metrics

According to [the ACPI spec for power meter](https://docs.kernel.org/hwmon/acpi_power_meter.html#special-features), there may be more than one power meter available per platform (`/sys/bus/acpi/drivers/power_meter/ACPI000D:XX/power[1-*]_average`).
To acquire power metrics for whole platform, I will aggregate metrics from all available power meters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aggregate in kepler? with sum or average ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original idea is using sum to add them together, but I found that there are no docs mentioning that multiple power meters here. I think this requires more discussion before implementing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants