- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.7k
pmon docker - Enable config of thermalctd polling interval #23139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pmon docker - Enable config of thermalctd polling interval #23139
Conversation
| /azp run Azure.sonic-buildimage | 
| Azure Pipelines successfully started running 1 pipeline(s). | 
| cc @judyjoseph | 
| {% endif -%} | ||
|  | ||
| {% if thermalctld.thermal_monitor_update_elapsed_threshold is defined and thermalctld.thermal_monitor_update_elapsed_threshold is not none %} | ||
| {%- set options = options + " --thermal-monitor-update-elapsed-threshold " + thermalctld.thermal_monitor_update_elapsed_threshold|string %} | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but please add details in PR description on the new intervals and which daemon/thread in thermalctld it will affect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @judyjoseph , I have updated the PR description to document this, please take a look.
| /azpw ms_conflict | 
| @rlhui could you help merge this PR | 
Platforms can now configure thermal monitor intervals in their pmon_daemon_control.json:
Note this only affects the
ThermalMonitorthread in thethermalctlddaemon.ThermalMonitor's role is to poll fan and temperature sensors from hardware and publish information to redis.This redis values are used in
show platform temperatureandshow platform fanfor example.Parameter Details
thermal_monitor_initial_intervalThermalMonitoronthermalctldstartup.thermal_monitor_update_intervalthermal_monitor_update_intervalseconds, the hardware is polledthermal_monitor_update_elapsed_thresholdthermal_monitor_update_elapsed_thresholdseconds to poll hardware (collected information from all fans and temperature sensors), a warning is logged.Why I did it
The default polling interval of 60s is quite high and feels unresponsive (i.e. an operator can remove a fan and wait nearly a minute for
show plat fanto update).How I did it
In sonic-net/sonic-platform-daemons#635 we made these intervals configurable.
This PR updates the jinja template to handle these new configuration options.
It decreases the update interval from 60s -> 10s for NH-4010. I'm aiming for a balance of responsiveness without polling excessively.
Example usage of these feature:
https://github.com/nexthop-ai/private-sonic-buildimage/blob/master/device/nexthop/common/pmon_daemon_control.json
How to verify it
Verified on NH-4010 that
thermalctldis being run with the expected options.Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)