Skip to content

Improve RPC reliability with health checks, watchdog restarts, and optional escalation#133

Open
Vheissu wants to merge 1 commit intohive-engine:mainfrom
Vheissu:feat/improvements
Open

Improve RPC reliability with health checks, watchdog restarts, and optional escalation#133
Vheissu wants to merge 1 commit intohive-engine:mainfrom
Vheissu:feat/improvements

Conversation

@Vheissu
Copy link

@Vheissu Vheissu commented Jan 26, 2026

Summary

  • Add a lightweight /health endpoint to the RPC server.
  • Introduce a master watchdog that probes the RPC health and restarts the JsonRPCServer when it becomes unresponsive.
  • Add optional escalation to restart the whole node after repeated RPC restarts.
  • Add optional RPC monitoring logs for slow requests and large batch sizes.
  • Document new settings in config example and README.

Why

RPC hangs (especially around batch requests) can leave the node running but unresponsive. This change adds automated recovery while also giving operators tools to surface the root cause via targeted logging.

Changes

  • plugins/JsonRPCServer.js: add /health endpoint and optional slow/batch logging middleware.
  • app.js: add health probe loop, restart logic, and escalation policy.
  • config.example.json: add rpcConfig.healthCheck and rpcConfig.monitoring defaults.
  • README.md: document health check, escalation, and monitoring knobs.

Testing

  • Not run (no automated tests added for the watchdog path yet).

Notes for ops

  • Health checks are enabled by default; escalation is opt-in via rpcConfig.healthCheck.escalateAfter.
  • Monitoring logs are opt-in via rpcConfig.monitoring.*.

Closes issue #8

Adds in health checks, watchdog restarts and optional escalation functionality.

Closes issue hive-engine#8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant