-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Edgerouter crashing - watchdog reboots with service enabled for some time #516
Comments
Hi, I am glad to hear that edgerouter-exporter sounds beneficial to you! It seems that one (or possibly more) of the commands that this exporter internally invokes to collect metrics might have caused such an issue on your router. I suspect that the crash stems from a memory leak, so I would like to know if a memory leak is observed by calling any of those commands repeatedly in order to find the suspicious one. The Setting
Best, |
I will try to collect the info you requested. I have not observed a crash yet since re-enabling the exporter. It could be some combination of factors triggering it, maybe I was doing more polling while setting it up initially etc. I will update this ticket with any new info. |
Thank you for taking your time to gather those information! That was quite helpful in terms of the amount of memory consumption not being related to this issue. In case such an occasional crash stems from the kernel, it might be necessary to keep the journal log persistent across boots. It seems to me that it’s possible by configuring /etc/systemd/journald.conf and restarting systemd-journald (just temporarily) though I haven’t tested this before. |
First of all, thank you for creating this, it's very useful to have those metrics available, and I appreciate you sharing your work with us.
I am running Edgerouter 4 with firmware v2.0.9-hotfix.7 with edgerouter-exporter 2.9.4 (prebuilt binary from github).
All the metrics look great coming from the service, however, with the service enabled, I am experiencing crashes where the router will become unresponsive, and the watchdog service initiates a reset.
I realize this is very generic information, so I would like to know what can I do in order to collect some useful logs/information on what might be happening? Due to the nature of the crash, syslog export gets interrupted, and there is not much in terms of useful info prior to the crash.
Syslog after the reboot just shows the kernel boot logs.
If you know of any sort of crash dump function I could enable to help with this, I would try to get those captured.
Thanks!
The text was updated successfully, but these errors were encountered: