-
Notifications
You must be signed in to change notification settings - Fork 215
docs(proposal): add EP_004 ACPI support for Kepler #2304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs(proposal): add EP_004 ACPI support for Kepler #2304
Conversation
Signed-off-by: Ray Huang <[email protected]>
f6e8ca6
to
d1e84de
Compare
Add ACPI option which is an alternative of Redfish to Kepler. | ||
ACPI can collect platform power metrics from reading sysfs. The platform collector is able to reuse the one that Redfish uses if merged. | ||
|
||
```text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: mermaid (like in MSR proposal) is a better way to draw this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fix this, but I think I need to open another PR after all things are discussed and checked. Otherwise, the merge commit for resolving conflict will also be included.
**CLI flags** | ||
|
||
``` | ||
--platform.acpi.enabled=true # Enable ACPI monitoring |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that for the actual feature implementation, lets introduce this as an experimental
feature, get feedback and then promote it as a supported feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, what should I do for an experimental feature? Change --platform.acpi.enabled=true
into --experimental.platform.acpi.enabled=true
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats right .. May be for proposals, we should add along with status: Implemented
, a field for maturity: Experimental|Stable
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There doesn’t seem to be any document explaining how to handle experimental features. For example, how to configure them or when to promote them to stable. I think a guide is needed. Otherwise, it may be hard for people to decide which feature should be an experimental one.
```prometheus | ||
# Platform power metrics with new source | ||
kepler_platform_watts{source="acpi",node_name="worker-1"} 450.5 | ||
kepler_platform_joules_total{source="acpi",node_name="worker-1"} 123456.789 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kepler_platform_joules_total{source="acpi",node_name="worker-1"} 123456.789 |
As discussed, we can't support energy joules total just like redfish from instantaneous power. In this case the 'average_power' is only with in the average_interval (which seems like a rolling window). I don't think we should read the power over the average_interval and integrate it.
Hence, lets not support energy computation from power since it is incorrect (based on my reading of the documentation: https://www.kernel.org/doc/html/latest/hwmon/acpi_power_meter.html
Incidentally I found https://github.com/torvalds/linux/blob/master/Documentation/hwmon/sysfs-interface.rst#energy but ACPI doesn't seem to implement it. Lets support this feature if and when kernel support it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I am going to implement is reading the power[1-*]_average
every power[1-*]_average_interval
so that every reading will occur in the rolling window. I don't think the result will be incorrect. But the overhead might be higher depends on different intervals.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we going to validate that the result is valid? IMHO, we should avoid providing a feature that isn't supported by the driver and can't be validated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I got it.
### Technical risks | ||
|
||
- **Several measurements under one power meter**: Aggregate `power*_average` to the power meter reading value. | ||
- **Several ACPI power meters detected**: Implement aggregation logic for to combine readings from multiple ACPI power meters (`ACPI000D:xx`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to take another look at this. We need to ensure that there is no hierarchy/topology in these directories otherwise we run the risk of double counting .. E.g.
E.g. what if ..
- Multiple ACPI000D:XX devices detected
- If meter A measures total platform power
- And meter B measures a subset (CPU power)
- Then A + B would be the wrong information to provide - right?
We need ...
-
Real world data /evidence of Multiple Meters .. and the structure
-
Do you know if we can make use of https://github.com/torvalds/linux/blob/master/drivers/hwmon/acpi_power_meter.c#L254 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out.
If there are multiple ACPI000D devices, we can use measures/
directory to check which device is measured by this power meter. ref
About multiple measurements power[1-*]_xxx
part, I haven't found a reference that telling how to determine how power1
, power2
are generated and what they measure.
The host I can access currently only has ACPI000D:00
and power1_xxx
in it, so I can only refer to docs or code to know how multiple power meters or measurements work.
|
||
- Testdata Structure: Mimic real sysfs structure under `internal/device/platform/acpi/testdata/` | ||
- Fake multiple power meters (`ACPI000D:xx`) or multiple measurements (`power*_average`) for testing | ||
- e.g., `testdata/sys/bus/acpi/drivers/power_meter/ACPI000D:00/power1_average`, with sample values for power readings (e.g., 450500000 microwatts for 450.5 watts). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can there be a different unit as well? Eg. milliwatts instead of micro? We have had cases in the past (with old kepler) where the power reported was abnormally high ..
See: #1774 -> https://github.com/sustainable-computing-io/kepler-metal-ci/blob/main/docs/train-validate-e2e/2024-09-09_22-41-38/AbsPower/BPFOnly/ExponentialRegressionTrainer/report-v0.7.11-212-gac5ee8b8.md#platform---idle
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Ray Huang <[email protected]>
Signed-off-by: Ray Huang <[email protected]>
|
||
### Functional Requirements | ||
|
||
- Use [golang sysfs package](https://github.com/prometheus/procfs) which is used by RAPL to read ACPI power_meter so that no additional dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what path(s) will be read for ACPI power by procfs? is reading /sys/class/hwmon
supported in procfs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally thought reading /sys/bus/acpi/drivers/power_meter/ACPI000D:XX/power[1-*]_average
.
But I am not sure if hwmon path and this path which is much common in different OS, this path should be the one that most OS support so that less issues will occur when OS changes. I am not sure which one is better.
I found that golang sysfs module doesn't support ACPI reading both the path I mentioned and hwmon path. I will change using sysfs module into reading file directly.
Thanks.
|
||
### Functional Requirements | ||
|
||
- Use [golang sysfs package](https://github.com/prometheus/procfs) which is used by RAPL to read ACPI power_meter so that no additional dependencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what permissions are needed to read acpi power?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the Ubuntu 24.04 host I accessed before, the power[1-*]_average
are readable for all users. So we only need to mount into container, and this has already done because of RAPL (mount host sysfs). There are no additional capabilities needed for reading.
### Aggregate power metrics | ||
|
||
According to [the ACPI spec for power meter](https://docs.kernel.org/hwmon/acpi_power_meter.html#special-features), there may be more than one power meter available per platform (`/sys/bus/acpi/drivers/power_meter/ACPI000D:XX/power[1-*]_average`). | ||
To acquire power metrics for whole platform, I will aggregate metrics from all available power meters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregate in kepler? with sum
or average
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original idea is using sum
to add them together, but I found that there are no docs mentioning that multiple power meters here. I think this requires more discussion before implementing.
This proposal aims to broaden the support of platform power and energy metrics with ACPI for Kepler.
#2079