Skip to content

Conversation

@lotus-nexthop
Copy link
Contributor

@lotus-nexthop lotus-nexthop commented Jun 29, 2025

Platforms can now configure thermal monitor intervals in their pmon_daemon_control.json:

# example
{
    "thermalctld": {
        "thermal_monitor_initial_interval": 5,
        "thermal_monitor_update_interval": 30,
        "thermal_monitor_update_elapsed_threshold": 25
    }
}

Note this only affects the ThermalMonitor thread in the thermalctld daemon.
ThermalMonitor's role is to poll fan and temperature sensors from hardware and publish information to redis.
This redis values are used in show platform temperature and show platform fan for example.

Parameter Details

thermal_monitor_initial_interval

  • Purpose: The initial time to wait before the first poll by ThermalMonitor on thermalctld startup.
  • Default: 5 seconds

thermal_monitor_update_interval

  • Purpose: Every thermal_monitor_update_interval seconds, the hardware is polled
  • Default: 60 seconds

thermal_monitor_update_elapsed_threshold

  • Purpose: If it takes longer than thermal_monitor_update_elapsed_threshold seconds to poll hardware (collected information from all fans and temperature sensors), a warning is logged.
  • Default: 30 seconds

Why I did it

The default polling interval of 60s is quite high and feels unresponsive (i.e. an operator can remove a fan and wait nearly a minute for show plat fan to update).

How I did it

In sonic-net/sonic-platform-daemons#635 we made these intervals configurable.

This PR updates the jinja template to handle these new configuration options.

It decreases the update interval from 60s -> 10s for NH-4010. I'm aiming for a balance of responsiveness without polling excessively.

Example usage of these feature:
https://github.com/nexthop-ai/private-sonic-buildimage/blob/master/device/nexthop/common/pmon_daemon_control.json

How to verify it

Verified on NH-4010 that thermalctld is being run with the expected options.

Which release branch to backport (provide reason below if selected)

  • 202205
  • 202211
  • 202305
  • 202311
  • 202405
  • 202411
  • 202505

Tested branch (Please provide the tested image version)

Description for the changelog

Link to config_db schema for YANG module changes

A picture of a cute animal (not mandatory but encouraged)

@mssonicbld
Copy link
Collaborator

/azp run Azure.sonic-buildimage

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@lotus-nexthop
Copy link
Contributor Author

cc @judyjoseph

{% endif -%}

{% if thermalctld.thermal_monitor_update_elapsed_threshold is defined and thermalctld.thermal_monitor_update_elapsed_threshold is not none %}
{%- set options = options + " --thermal-monitor-update-elapsed-threshold " + thermalctld.thermal_monitor_update_elapsed_threshold|string %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but please add details in PR description on the new intervals and which daemon/thread in thermalctld it will affect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @judyjoseph , I have updated the PR description to document this, please take a look.

@judyjoseph
Copy link
Contributor

/azpw ms_conflict

@judyjoseph
Copy link
Contributor

@rlhui could you help merge this PR

@rlhui rlhui merged commit 81a3435 into sonic-net:master Oct 29, 2025
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants