|
| 1 | +# Prometheus Monitoring |
| 2 | + |
| 3 | +When the binary is compiled with the `--features "monitoring"` flag, then [Prometheus](https://prometheus.io/) will be enabled. |
| 4 | + |
| 5 | +From the website: |
| 6 | + |
| 7 | +> Prometheus is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true. |
| 8 | +
|
| 9 | +We combine Prometheus and [Metrics](https://github.com/metrics-rs/metrics) in order to get detailed metrics about our node's performance and expose them via an API endpoint, which may then be consumed by other services. |
| 10 | + |
| 11 | +To enable Prometheus, provide an extra section with Prometheus-related configuration options in your config file (or CLI arguments). |
| 12 | + |
| 13 | +```yaml |
| 14 | +PrometheusConfig: |
| 15 | + bind_address: 127.0.0.1:9999 |
| 16 | + histogram_window: 5000 |
| 17 | + histogram_granularity: 1000 |
| 18 | +``` |
| 19 | +
|
| 20 | +This will open an endpoint at `http://127.0.0.1:9999/` which you can query to get the current data gathered by our instrumentation system exposed via Prometheus. |
| 21 | + |
| 22 | +For each request, we do the following: |
| 23 | +1. Increment the number of prepare packets for the type of request |
| 24 | +1. Monitor the time (in nanonseconds) required to handle the request |
| 25 | +1. Increment the number of fulfill (or reject, depending on the result of the previous step) packets for the type of request |
| 26 | + |
| 27 | +Each of the above logs is labelled with the sending account's asset code and routing relation if it comes from an Incoming request. If it is an outgoing request, then we also label it with the receiving account's asset code and routing relation. |
| 28 | + |
| 29 | +Example output below: |
| 30 | + |
| 31 | +``` |
| 32 | +$ curl localhost:9999/ |
| 33 | +
|
| 34 | +# metrics snapshot (ts=1580809069) (prometheus exposition format) |
| 35 | +# TYPE requests_outgoing_fulfill counter |
| 36 | +requests_outgoing_fulfill{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount"} 1 |
| 37 | +
|
| 38 | +# TYPE requests_incoming_prepare counter |
| 39 | +requests_incoming_prepare{from_asset_code="ABC",from_routing_relation="NonRoutingAccount"} 3 |
| 40 | +
|
| 41 | +# TYPE requests_incoming_reject counter |
| 42 | +requests_incoming_reject{from_asset_code="ABC",from_routing_relation="NonRoutingAccount"} 1 |
| 43 | +
|
| 44 | +# TYPE requests_outgoing_prepare counter |
| 45 | +requests_outgoing_prepare{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount"} 2 |
| 46 | +
|
| 47 | +# TYPE requests_incoming_fulfill counter |
| 48 | +requests_incoming_fulfill{from_asset_code="ABC",from_routing_relation="NonRoutingAccount"} 2 |
| 49 | +
|
| 50 | +# TYPE requests_outgoing_reject counter |
| 51 | +requests_outgoing_reject{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount"} 1 |
| 52 | +
|
| 53 | +# TYPE requests_incoming_duration summary |
| 54 | +requests_incoming_duration{from_asset_code="ABC",from_routing_relation="NonRoutingAccount",quantile="0"} 365824 |
| 55 | +requests_incoming_duration{from_asset_code="ABC",from_routing_relation="NonRoutingAccount",quantile="0.5"} 20922367 |
| 56 | +requests_incoming_duration{from_asset_code="ABC",from_routing_relation="NonRoutingAccount",quantile="0.9"} 22249471 |
| 57 | +requests_incoming_duration{from_asset_code="ABC",from_routing_relation="NonRoutingAccount",quantile="0.95"} 22249471 |
| 58 | +requests_incoming_duration{from_asset_code="ABC",from_routing_relation="NonRoutingAccount",quantile="0.99"} 22249471 |
| 59 | +requests_incoming_duration{from_asset_code="ABC",from_routing_relation="NonRoutingAccount",quantile="0.999"} 22249471 |
| 60 | +requests_incoming_duration{from_asset_code="ABC",from_routing_relation="NonRoutingAccount",quantile="1"} 22249471 |
| 61 | +requests_incoming_duration_sum{from_asset_code="ABC",from_routing_relation="NonRoutingAccount"} 43528460 |
| 62 | +requests_incoming_duration_count{from_asset_code="ABC",from_routing_relation="NonRoutingAccount"} 3 |
| 63 | +
|
| 64 | +# TYPE requests_outgoing_duration summary |
| 65 | +requests_outgoing_duration{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount",quantile="0"} 14123008 |
| 66 | +requests_outgoing_duration{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount",quantile="0.5"} 14131199 |
| 67 | +requests_outgoing_duration{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount",quantile="0.9"} 16744447 |
| 68 | +requests_outgoing_duration{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount",quantile="0.95"} 16744447 |
| 69 | +requests_outgoing_duration{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount",quantile="0.99"} 16744447 |
| 70 | +requests_outgoing_duration{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount",quantile="0.999"} 16744447 |
| 71 | +requests_outgoing_duration{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount",quantile="1"} 16744447 |
| 72 | +requests_outgoing_duration_sum{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount"} 30871847 |
| 73 | +requests_outgoing_duration_count{from_asset_code="ABC",to_asset_code="ABC",from_routing_relation="NonRoutingAccount",to_routing_relation="NonRoutingAccount"} 2 |
| 74 | +``` |
0 commit comments