Skip to content

Conversation

@michaelc-nexthop
Copy link

Default polling interval of 60s is quite high and can be unresponsive. Vendors should be able to override this polling interval

Description

Motivation and Context

How Has This Been Tested?

Additional Information (Optional)

…urable.

Default polling interval of 60s is quite high and can be unresponsive. Vendors should be able to override this polling interval
@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@gregoryboudreau
Copy link
Contributor

would it make more sense to do this via a config file like was done for sensormond? https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-sensormond/scripts/sensormond#L485

@lotus-nexthop
Copy link

would it make more sense to do this via a config file like was done for sensormond? https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-sensormond/scripts/sensormond#L485

Hi Gregory,

We currently have the config set by the existing pmon_daemon_control.json config file:
https://github.com/search?q=repo%3Asonic-net%2Fsonic-buildimage+path%3Apmon_daemon_control.json&type=code

Example usage of this new config:
https://github.com/sonic-net/sonic-buildimage/blob/30d2b7881971ad38fdc0b242f57c9efdb6f93693/device/nexthop/common/pmon_daemon_control.json#L5-L8

PR which adds new config flags to thermalctld:
#635

PR which sets new config flags using pmon_daemon_control.json:
https://github.com/sonic-net/sonic-buildimage/pull/23139/files

When we were working on this initially we thought pmon_daemon_control.json would be the most fitting place to put the configuration.

We hadn't seen your change to use platform_env.conf yet.

What are your thoughts on platform_env.conf vs pmon_daemon_control.json?

@gregoryboudreau
Copy link
Contributor

would it make more sense to do this via a config file like was done for sensormond? https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-sensormond/scripts/sensormond#L485

Hi Gregory,

We currently have the config set by the existing pmon_daemon_control.json config file: https://github.com/search?q=repo%3Asonic-net%2Fsonic-buildimage+path%3Apmon_daemon_control.json&type=code

Example usage of this new config: https://github.com/sonic-net/sonic-buildimage/blob/30d2b7881971ad38fdc0b242f57c9efdb6f93693/device/nexthop/common/pmon_daemon_control.json#L5-L8

PR which adds new config flags to thermalctld: #635

PR which sets new config flags using pmon_daemon_control.json: https://github.com/sonic-net/sonic-buildimage/pull/23139/files

When we were working on this initially we thought pmon_daemon_control.json would be the most fitting place to put the configuration.

We hadn't seen your change to use platform_env.conf yet.

What are your thoughts on platform_env.conf vs pmon_daemon_control.json?

I had actually originally tried the same thing using the pmon daemon control to act as a further input for this to the supervisord configuration: 9e91789#diff-195551a342dbd08efef0aa4f3e21a4f4e24e2ee87cadcc27070299e9fc4a0390R560. General community sentiment though was to use the platform_env.conf for platform specific configurability (at least when this was discussed in the chassis call when we reviewed it originally).

@gregoryboudreau
Copy link
Contributor

Chassisd also uses it for a timing related configuration: https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-chassisd/scripts/chassisd#L312. Probably best to keep it consistent across the different daemons. Imo you should join next weeks chassis call and bring it up for discussion if you feel strongly towards the pmon_daemon_control method.

@lotus-nexthop
Copy link

Thanks Gregory, I will join next week and get more ideas before refactoring this.

thermal_control = ThermalControlDaemon()
parser = argparse.ArgumentParser()
parser.add_argument('--thermal-monitor-initial-interval', type=int, default=5)
parser.add_argument('--thermal-monitor-update-interval', type=int, default=60)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see @gregoryboudreau comments earlier in the issue. I have similar comment - It is not possible to add platform speficif timer intervals when we start thermalctld in supervisord.conf : https://github.com/sonic-net/sonic-buildimage/blob/master/dockers/docker-platform-monitor/docker-pmon.supervisord.conf.j2.

We can define it in platform.json file for that particular platform.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am of the opinion that we keep this unified across daemons within PMON and put it in the platform_env.conf given that sensormond and chassisd already use this for timing related configurations that differ from the default.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Judy and Gregory, will update these PRs soon to go with that feedback.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have updated the PR to get timing-related configurations from platform_env.conf, so that it is consistent with sensormond and chassisd. Unit tests were added as well.
Please take another look.
Thank you both Judy and Gregory!

Copy link
Contributor

@judyjoseph judyjoseph Sep 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! sorry if I got this comment late @lotus-nexthop

platform_env.conf was originally thought to set configurations platform wide eg: https://github.com/sonic-net/sonic-buildimage/blob/master/device/arista/x86_64-arista_7060x6_16pe_384c_b/platform_env.conf, https://github.com/sonic-net/sonic-buildimage/blob/master/device/arista/x86_64-arista_7280cr3_32p4/platform_env.conf

Prefer to update the timer configuraion details in the "pmon_daemon_control.json" where we can keep configs pertaining to any daemon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gregoryboudreau I feel the timer configurations are a pmon daemon specific configurations, which is clear if it is in this format : https://github.com/sonic-net/sonic-buildimage/blob/master/device/nexthop/common/pmon_daemon_control.json#L5-L8

Good to keep all pmon daemon configs in this file.

Copy link
Contributor

@gregoryboudreau gregoryboudreau Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, should we change at least sensormond to follow this model then?

Copy link

@louis-nexthop louis-nexthop Oct 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also +1 to making this change in pmon_daemon_control.json. We can do incremental improvement starting with sensormond.

I can revert this PR back to the original. Please note that it will need to be paired with sonic-net/sonic-buildimage#23139 as well for the full functionality.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is now reverted back to the original.

Please help review both this PR and sonic-net/sonic-buildimage#23139. Thank you both!

Copy link
Contributor

@spilkey-cisco spilkey-cisco Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possibility is to have thermal related daemons/threads use the same interval specified by thermal_manager for the fan/thermal algorithm (or some scaled smaller interval to ensure operational fluctuations do not delay temperature updates to the algorithm): https://github.com/sonic-net/sonic-platform-daemons/blob/master/sonic-thermalctld/scripts/thermalctld#L837
This would ideally also be able to apply to xcvrd for DOM polling, syncd for ASIC temperature polling, and perhaps others I'm unaware of.

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

k, v = line.split('=', 1)
k = k.strip()
v = v.strip()
if k == 'THERMALCTLD_THERMAL_MONITOR_INITIAL_INTERVAL':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be wrapped in a try/except? Not sure if you want these to fallback to defaults or take the daemon down on an incorrect config file.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this up! I think it's better to fallback to the default values and let the daemon continue.

I added try/except and updated the unit tests as well. Thanks, Gregory!

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mssonicbld
Copy link
Collaborator

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants