Skip to content

Conversation

huyc-arista
Copy link
Contributor

Background

Add printing fboss service logs and other debug logs to showtech

Fix bug where platformUitls.execCommand() throws an exception if the child process was killed (interupt, core dump, etc) causing showtech to crash

Change

  • Add Utils::printServiceLogs() to print various debug logs truncated to 5000 lines each
  • Wrap platformUtils_.execCommand(cmd) into safeExecCommand(cmd) to handle any exceptions thrown

Testing

Internally we saw fw_util failure causing showtech to crash, after the changes verified showtech --details fwutil no longer crash

Sample showtech output for showtech --details logs with maxLine set to 10 lines (actual implementation is 5000)

[root@wlr202 admin]# showtech --details logs
I1009 01:00:34.830133  7275 PlatformNameLib.cpp:79] Platform name read from cache: MERU800BFA
I1009 01:00:34.830204  7275 Main.cpp:67] Executing: logs
##### Platform Manager Log #####
=== Output exceeds 10 lines (total: 2272). Showing first 5 and last 5 lines ===

Oct 09 00:57:27 wlr202.sjc.aristanetworks.com systemd[1]: Starting FBOSS Platform Manager...
Oct 09 00:57:27 wlr202.sjc.aristanetworks.com platform_manager[5842]: I1009 00:57:27.334749  5842 PlatformNameLib.cpp:56] Getting platform name from bios using dmidecode ...
Oct 09 00:57:27 wlr202.sjc.aristanetworks.com platform_manager[5842]: I1009 00:57:27.338967  5842 PlatformNameLib.cpp:65] Platform name inferred from bios: MERU800BFA
Oct 09 00:57:27 wlr202.sjc.aristanetworks.com platform_manager[5842]: I1009 00:57:27.338982  5842 PlatformNameLib.cpp:67] Platform name mapped: MERU800BFA
Oct 09 00:57:27 wlr202.sjc.aristanetworks.com platform_manager[5842]: I1009 00:57:27.339955  5842 ConfigLib.cpp:40] Using config file: /opt/fboss/share/platform_configs/platform_manager.json

=== 2262 lines truncated ===

Oct 09 00:57:58 wlr202.sjc.aristanetworks.com systemd[1]: Started FBOSS Platform Manager.
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com platform_manager[5842]: I1009 00:57:58.503704  5842 ThriftServer.cpp:2061] Using randomly generated TLS ticket keys.
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com platform_manager[5842]: I1009 00:57:58.503815  5842 ThriftServer.cpp:983] Using resource pools on address/port 5975: thrift flag: true, enable gflag: false, disable gflag: false, runtime actions:
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com platform_manager[5842]: I1009 00:57:58.504469  5842 ThriftServer.cpp:1540] Resource pools configured: 6


##### Sensor Service Log #####
=== Output exceeds 10 lines (total: 272). Showing first 5 and last 5 lines ===

Oct 09 00:57:58 wlr202.sjc.aristanetworks.com systemd[1]: Started Start sensor_service.
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_sensor_service.sh[6552]: I1009 00:57:58.556929  6552 PlatformNameLib.cpp:79] Platform name read from cache: MERU800BFA
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_sensor_service.sh[6552]: I1009 00:57:58.556969  6552 ConfigLib.cpp:40] Using config file: /opt/fboss/share/platform_configs/sensor_service.json
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_sensor_service.sh[6552]: I1009 00:57:58.557276  6552 ConfigValidator.cpp:16] Validating sensor_service config
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_sensor_service.sh[6552]: I1009 00:57:58.557744  6552 SensorServiceImpl.cpp:91] Reading SensorData for 6 PMUnits

=== 262 lines truncated ===

Oct 09 01:00:34 wlr202.sjc.aristanetworks.com run_sensor_service.sh[6552]: I1009 01:00:34.151652  6914 SensorServiceImpl.cpp:96] Processing PSU PMUnit: 11 sensors
Oct 09 01:00:34 wlr202.sjc.aristanetworks.com run_sensor_service.sh[6552]: I1009 01:00:34.162611  6914 SensorServiceImpl.cpp:96] Processing PSU PMUnit: 11 sensors
Oct 09 01:00:34 wlr202.sjc.aristanetworks.com run_sensor_service.sh[6552]: I1009 01:00:34.173798  6914 SensorServiceImpl.cpp:96] Processing PSU PMUnit: 11 sensors
Oct 09 01:00:34 wlr202.sjc.aristanetworks.com run_sensor_service.sh[6552]: I1009 01:00:34.185624  6914 SensorServiceImpl.cpp:130] Summary: Processed 132 Sensors. 0 Failures.


##### Fan Service Log #####
=== Output exceeds 10 lines (total: 871). Showing first 5 and last 5 lines ===

Oct 09 00:57:58 wlr202.sjc.aristanetworks.com systemd[1]: Started Start fan_service.
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_fan_service.sh[6592]: I1009 00:57:58.585421  6592 PlatformNameLib.cpp:79] Platform name read from cache: MERU800BFA
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_fan_service.sh[6592]: I1009 00:57:58.585494  6592 ConfigLib.cpp:40] Using config file: /opt/fboss/share/platform_configs/fan_service.json
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_fan_service.sh[6592]: I1009 00:57:58.585663  6592 Main.cpp:53] ...
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_fan_service.sh[6592]: I1009 00:57:58.585683  6592 ConfigValidator.cpp:47] Validating fan_service config

=== 861 lines truncated ===

Oct 09 01:00:09 wlr202.sjc.aristanetworks.com run_fan_service.sh[6592]: I1009 01:00:09.770951  6613 ControlLogic.cpp:465] fan_9: Setting LED to Good (value: 1)
Oct 09 01:00:09 wlr202.sjc.aristanetworks.com run_fan_service.sh[6592]: I1009 01:00:09.772339  6613 ControlLogic.cpp:465] fan_10: Setting LED to Good (value: 1)
Oct 09 01:00:09 wlr202.sjc.aristanetworks.com run_fan_service.sh[6592]: I1009 01:00:09.774146  6613 ControlLogic.cpp:465] fan_11: Setting LED to Good (value: 1)
Oct 09 01:00:09 wlr202.sjc.aristanetworks.com run_fan_service.sh[6592]: I1009 01:00:09.775961  6613 ControlLogic.cpp:465] fan_12: Setting LED to Good (value: 1)


##### Data Corral Log #####
=== Output exceeds 10 lines (total: 747). Showing first 5 and last 5 lines ===

Oct 09 00:57:58 wlr202.sjc.aristanetworks.com systemd[1]: Started Start data_corral_service.
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_data_corral_service.sh[6551]: I1009 00:57:58.556123  6551 PlatformNameLib.cpp:79] Platform name read from cache: MERU800BFA
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_data_corral_service.sh[6551]: I1009 00:57:58.556159  6551 ConfigLib.cpp:40] Using config file: /opt/fboss/share/platform_configs/led_manager.json
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_data_corral_service.sh[6551]: I1009 00:57:58.556220  6551 ConfigValidator.cpp:10] Validating data_corral_service config
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_data_corral_service.sh[6551]: I1009 00:57:58.556224  6551 ConfigValidator.cpp:12] Validating the system LED config

=== 737 lines truncated ===

Oct 09 01:00:28 wlr202.sjc.aristanetworks.com run_data_corral_service.sh[6551]: I1009 01:00:28.569593  6555 LedManager.cpp:25] Programming system LED with true
Oct 09 01:00:28 wlr202.sjc.aristanetworks.com run_data_corral_service.sh[6551]: I1009 01:00:28.569608  6555 LedManager.cpp:80] Wrote 1 to file /sys/class/leds/sys_led:green:status/brightness
Oct 09 01:00:28 wlr202.sjc.aristanetworks.com run_data_corral_service.sh[6551]: I1009 01:00:28.569621  6555 LedManager.cpp:80] Wrote 0 to file /sys/class/leds/sys_led:red:status/brightness
Oct 09 01:00:28 wlr202.sjc.aristanetworks.com run_data_corral_service.sh[6551]: I1009 01:00:28.569624  6555 LedManager.cpp:34] Programmed system LED with true


##### QSFP Service Log #####
=== Output exceeds 10 lines (total: 5988). Showing first 5 and last 5 lines ===

Oct 09 00:57:58 wlr202.sjc.aristanetworks.com systemd[1]: Started Start qsfp_service.
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_qsfp_service.sh[6635]: Overriding default flag from config: firmware_upgrade_on_coldboot=true
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_qsfp_service.sh[6635]: Overriding default flag from config: firmware_upgrade_on_link_down=false
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_qsfp_service.sh[6635]: Overriding default flag from config: firmware_upgrade_on_tcvr_insert=true
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_qsfp_service.sh[6635]: Overriding default flag from config: firmware_upgrade_supported=true

=== 5978 lines truncated ===

Oct 09 01:00:26 wlr202.sjc.aristanetworks.com run_qsfp_service.sh[6635]: W1009 01:00:26.075540  6837 TransceiverStateMachine.h:376] [Transceiver:119] programTransceiver failed:facebook::fboss::FbossError: Port: 119 Unsupported Application by the module:
Oct 09 01:00:26 wlr202.sjc.aristanetworks.com run_qsfp_service.sh[6635]: E1009 01:00:26.075550  6837 TransceiverStateMachineController.cpp:121] Failed to apply TransceiverStateMachineUpdate [Transceiver: TransceiverID(119), Event:TCVR_EV_PROGRAM_TRANSCEIVER]
Oct 09 01:00:26 wlr202.sjc.aristanetworks.com run_qsfp_service.sh[6635]: I1009 01:00:26.075596  7083 TransceiverManager.cpp:2222] refreshStateMachines ended
Oct 09 01:00:26 wlr202.sjc.aristanetworks.com run_qsfp_service.sh[6635]: I1009 01:00:26.075607  7083 TransceiverManager.cpp:2184] refreshStateMachines started


##### fboss_sw_agent Log #####
=== Output exceeds 10 lines (total: 19489). Showing first 5 and last 5 lines ===

Oct 09 00:57:58 wlr202.sjc.aristanetworks.com systemd[1]: Starting FBOSS sw agent...
Oct 09 00:57:58 wlr202.sjc.aristanetworks.com run_sw_agent.sh[6778]: modprobe: FATAL: Module linux_kernel_bde is in use.
Oct 09 00:57:59 wlr202.sjc.aristanetworks.com run_sw_agent.sh[6698]: Setting fboss environment variables
Oct 09 00:57:59 wlr202.sjc.aristanetworks.com run_sw_agent.sh[6698]: Running setup.py with FBOSS=/opt/fboss
Oct 09 00:57:59 wlr202.sjc.aristanetworks.com systemd[1]: Started FBOSS sw agent.

=== 19479 lines truncated ===

Oct 09 01:00:20 wlr202.sjc.aristanetworks.com run_sw_agent.sh[6951]: V1009 01:00:20.261548  7127 ThriftHandler.cpp:2431] [0x7f81f001c520] getConfigAppliedInfo thrift request received from ::1 (unknown)
Oct 09 01:00:20 wlr202.sjc.aristanetworks.com run_sw_agent.sh[6951]: V1009 01:00:20.261567  7127 ThriftHandler.cpp:2431] [0x7f81f001c520] getConfigAppliedInfo thrift request succeeded in 0ms
Oct 09 01:00:26 wlr202.sjc.aristanetworks.com run_sw_agent.sh[6951]: V1009 01:00:26.075886  7127 ThriftHandler.cpp:1414] [0x7f81e4009f00] getPortStatus thrift request received from ::1 (unknown)
Oct 09 01:00:26 wlr202.sjc.aristanetworks.com run_sw_agent.sh[6951]: V1009 01:00:26.081942  7127 ThriftHandler.cpp:1414] [0x7f81e4009f00] getPortStatus thrift request succeeded in 6ms


##### fboss_hw_agent@0 Log #####
=== Output exceeds 10 lines (total: 3156). Showing first 5 and last 5 lines ===

Oct 09 00:57:59 wlr202.sjc.aristanetworks.com systemd[1]: Started FBOSS hw agent for ASIC 0.
Oct 09 00:57:59 wlr202.sjc.aristanetworks.com run_hw_agent.sh[6968]: WARNING: Logging before InitGoogleLogging() is written to STDERR
Oct 09 00:57:59 wlr202.sjc.aristanetworks.com run_hw_agent.sh[6968]: W20251009 00:57:59.069456  6968 ServiceData.cpp:603] setUseOptionsAsFlags is a dangerous API and can expose your service to a Remote Code Execution vulnerability. Please consider using alternative methods like configerator to set properties dynamically
Oct 09 00:57:59 wlr202.sjc.aristanetworks.com run_hw_agent.sh[6968]: Overriding default flag from config: can_warm_boot=false
Oct 09 00:57:59 wlr202.sjc.aristanetworks.com run_hw_agent.sh[6968]: Overriding default flag from config: check_wb_handles=true

=== 3146 lines truncated ===

Oct 09 01:00:33 wlr202.sjc.aristanetworks.com run_hw_agent.sh[6968]: E1009 01:00:33.718367  7135 SaiSwitchManager.cpp:1210] Failed to get temperature attribute 19: [switch] Failed to get sai attribute: SwitchSaiId(9880898823696416768): NumTemperatureSensors: 0: INVALID PARAMETER
Oct 09 01:00:33 wlr202.sjc.aristanetworks.com run_hw_agent.sh[6968]: E1009 01:00:33.718385  7135 SaiSwitchManager.cpp:1210] Failed to get temperature attribute 20: [switch] Failed to get sai attribute: SwitchSaiId(9880898823696416768): NumTemperatureSensors: 0: INVALID PARAMETER
Oct 09 01:00:34 wlr202.sjc.aristanetworks.com run_hw_agent.sh[6968]: E1009 01:00:34.765338  7135 SaiSwitchManager.cpp:1210] Failed to get temperature attribute 19: [switch] Failed to get sai attribute: SwitchSaiId(9880898823696416768): NumTemperatureSensors: 0: INVALID PARAMETER
Oct 09 01:00:34 wlr202.sjc.aristanetworks.com run_hw_agent.sh[6968]: E1009 01:00:34.765357  7135 SaiSwitchManager.cpp:1210] Failed to get temperature attribute 20: [switch] Failed to get sai attribute: SwitchSaiId(9880898823696416768): NumTemperatureSensors: 0: INVALID PARAMETER


##### demsg Log #####
=== Output exceeds 10 lines (total: 1818). Showing first 5 and last 5 lines ===

[    0.000000] Linux version 6.4.3-0_fbk747_rc2_1199_ga95cd85c72c4 ...
[    0.000000] Command line:   rw root=live:LABEL=eos_flash rd.live.dir=/boot rd.live.overlay. ...
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD

=== 1808 lines truncated ===

[  120.518264] i2c i2c-170: new_device: Instantiated device pmbus at 0x58
[  121.461397] i2c i2c-130: new_device: Instantiated device pmbus at 0x58
[  122.404399] i2c i2c-171: new_device: Instantiated device pmbus at 0x58
[  123.348394] i2c i2c-131: new_device: Instantiated device pmbus at 0x58


##### Boot Console Log #####
cat: /var/log/boot.log: No such file or directory

##### Linux Messages Log #####
=== Output exceeds 10 lines (total: 19806). Showing first 5 and last 5 lines ===

Oct  9 00:55:57 wlr202 kernel: Linux version 6.4.3-0_fbk747_rc2_1199_ga95cd85c72c4 ...
Oct  9 00:55:57 wlr202 kernel: Command line:   ...
Oct  9 00:55:57 wlr202 kernel: KERNEL supported cpus:
Oct  9 00:55:57 wlr202 kernel:  Intel GenuineIntel
Oct  9 00:55:57 wlr202 kernel:  AMD AuthenticAMD

=== 19796 lines truncated ===

Oct  9 00:58:48 wlr202 run_qsfp_service.sh[6635]: I1009 00:58:48.499009  6866 TransceiverStateMachineController.cpp:61] Applying TransceiverStateMachine Update for [Transceiver: TransceiverID(57), Event:TCVR_EV_PROGRAM_TRANSCEIVER]
Oct  9 00:58:48 wlr202 run_qsfp_service.sh[6635]: I1009 00:58:48.499010  6865 TransceiverStateMachineController.cpp:114] Successfully applied state update for [Transceiver: TransceiverID(56), Event:TCVR_EV_PROGRAM_TRANSCEIVER] from TRANSCEIVER_READY to TRANSCEIVER_PROGRAMMED
Oct  9 00:58:48 wlr202 run_qsfp_service.sh[6635]: I1009 00:58:48.499011  6867 TransceiverStateMachineController.cpp:61] Applying TransceiverStateMachine Update for [Transceiver: TransceiverID(58), Event:TCVR_EV_PROGRAM_TRANSCEIVER]
Oct  9 00:58:48 wlr202 run_qsfp_service.sh[6635]: I1009 00:58:48.499015  6866 TransceiverStateMachineController.cpp:114] Successfully applied state update for [Transceiver: TransceiverID(57), Event:TCVR_EV_PROGRAM_TRANSCEIVER] from TRANSCEIVER_READY to TRANSCEIVER_PROGRAMMED


@meta-cla meta-cla bot added the CLA Signed label Oct 9, 2025
Copy link

meta-codesync bot commented Oct 13, 2025

@somasun has imported this pull request. If you are a Meta employee, you can view this in D84529564.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant