Edgerouter crashing - watchdog reboots with service enabled for some time #516

geekifier · 2024-10-31T18:20:37Z

First of all, thank you for creating this, it's very useful to have those metrics available, and I appreciate you sharing your work with us.

I am running Edgerouter 4 with firmware v2.0.9-hotfix.7 with edgerouter-exporter 2.9.4 (prebuilt binary from github).

$ cat /config/user-data/edgerouter-exporter.env
LOG_LEVEL=info
PORT=9090

All the metrics look great coming from the service, however, with the service enabled, I am experiencing crashes where the router will become unresponsive, and the watchdog service initiates a reset.

I realize this is very generic information, so I would like to know what can I do in order to collect some useful logs/information on what might be happening? Due to the nature of the crash, syslog export gets interrupted, and there is not much in terms of useful info prior to the crash.

Syslog after the reboot just shows the kernel boot logs.

If you know of any sort of crash dump function I could enable to help with this, I would try to get those captured.

Thanks!

The text was updated successfully, but these errors were encountered:

chitoku-k · 2024-11-01T13:36:15Z

Hi, I am glad to hear that edgerouter-exporter sounds beneficial to you!

It seems that one (or possibly more) of the commands that this exporter internally invokes to collect metrics might have caused such an issue on your router. I suspect that the crash stems from a memory leak, so I would like to know if a memory leak is observed by calling any of those commands repeatedly in order to find the suspicious one. The available section from the free command should remain mostly the same if they don’t leak at all.

Setting LOG_LEVEL to debug enables you to inspect what commands the exporter internally invokes:

$ sudo PORT=9090 LOG_LEVEL=debug /usr/bin/edgerouter-exporter
[2024-11-01T12:57:43Z DEBUG] executing /opt/vyatta/sbin/ubnt_vtysh with ["-c", "show ip bgp summary"]
[2024-11-01T12:57:43Z DEBUG] executing /opt/vyatta/sbin/ubnt_vtysh with ["-c", "show bgp ipv6 summary"]
[2024-11-01T12:57:43Z DEBUG] executing /opt/vyatta/bin/sudo-users/vyatta-op-dynamic-dns.pl with ["--show-status"]
[2024-11-01T12:57:43Z DEBUG] executing /opt/vyatta/bin/vyatta-op-cmd-wrapper with ["show", "load-balance", "status"]
[2024-11-01T12:57:43Z DEBUG] executing /bin/ip with ["--brief", "addr", "show"]
[2024-11-01T12:57:43Z DEBUG] executing /opt/vyatta/bin/vyatta-op-cmd-wrapper with ["show", "version"]
[2024-11-01T12:57:44Z DEBUG] executing /opt/vyatta/bin/vyatta-op-cmd-wrapper with ["show", "pppoe-client"]
[2024-11-01T12:57:44Z DEBUG] executing /opt/vyatta/bin/vyatta-op-cmd-wrapper with ["show", "load-balance", "watchdog"]

Best,

geekifier · 2024-11-01T17:17:18Z

I will try to collect the info you requested.
I have not observed a memory leak, unless it happens very rapidly, the exporter service itself consumes very little RAM based on my monitoring.

I have not observed a crash yet since re-enabling the exporter. It could be some combination of factors triggering it, maybe I was doing more polling while setting it up initially etc.

I will update this ticket with any new info.

geekifier · 2024-12-02T15:54:13Z

I have run many iterations of the commands via a for loop, and did not observe any increase in memory consumption.
The router actually ran fine since my last post, only crashing this morning (2024-12-02).
There was no spike in memory usage based on my monitoring.

If there is no way to retrieve some sort of a core dump from EdgeOS, then I am not sure what else I could check.
For now, I will disable the service and monitor the uptime.

chitoku-k · 2024-12-06T10:02:15Z

Thank you for taking your time to gather those information! That was quite helpful in terms of the amount of memory consumption not being related to this issue.

In case such an occasional crash stems from the kernel, it might be necessary to keep the journal log persistent across boots. It seems to me that it’s possible by configuring /etc/systemd/journald.conf and restarting systemd-journald (just temporarily) though I haven’t tested this before.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Edgerouter crashing - watchdog reboots with service enabled for some time #516

Edgerouter crashing - watchdog reboots with service enabled for some time #516

geekifier commented Oct 31, 2024 •

edited

Loading

chitoku-k commented Nov 1, 2024

geekifier commented Nov 1, 2024

geekifier commented Dec 2, 2024 •

edited

Loading

chitoku-k commented Dec 6, 2024

Edgerouter crashing - watchdog reboots with service enabled for some time #516

Edgerouter crashing - watchdog reboots with service enabled for some time #516

Comments

geekifier commented Oct 31, 2024 • edited Loading

chitoku-k commented Nov 1, 2024

geekifier commented Nov 1, 2024

geekifier commented Dec 2, 2024 • edited Loading

chitoku-k commented Dec 6, 2024

geekifier commented Oct 31, 2024 •

edited

Loading

geekifier commented Dec 2, 2024 •

edited

Loading