Skip to content

Latest commit

 

History

History
108 lines (83 loc) · 2.97 KB

README.md

File metadata and controls

108 lines (83 loc) · 2.97 KB

gorelka

This project provides a relay that can accept metrics in various formats (initially Graphite Line protocol) and send them through various ways.

Status

Early Proof of Concept. Compiles, but not tested extensively

Features

General:

  • Don't store metrics forever in queues in case destination is unavailable
  • Offload queue overflows to disk
  • Internal stats
  • Extended stats

Benchmark:

  • Provide simple but configurable load generator
  • Load generator should be fast (at least 10M lines/sec on a E5-2620 over loopback)
  • Delay measurements
  • Extrapolate speed based on when data arrives

Config:

  • Override key for Transport distribution

Calculator:

  • Calculate real metric frequency
  • Detect semi-frequent metrics

Input:

  • TCP
  • UDP
  • Unix Socket
  • TLS
  • Configurable encoding

Input Encoders:

  • Graphite Line Protocol
  • Graphite Line Protocol with tags
  • Metrics 2.0
  • InfluxDB Line Protocol

Output Encoders:

  • Graphite Line Protocol
  • Graphite Line Protocol with tags
  • JSON
  • Protobuf
  • kafkamdm

Output:

  • Kafka
  • TCP
  • UDP
  • Unix Socket

Routing:

  • Regexp matching (Re2-based)
  • Rewrites
  • Prefix Matching
  • Blackhole sender
  • Log on receive
  • PCRE Regexp Matching
  • Separate tool to show where metric will lend

LoadBalancing:

Documentation:

  • At least some docs
  • Design documentation
  • Extended docs

Performance

Internal benchmarks shows that current version of relay can do simple routing (StatsWith: "" + send to 4 destinations) of 2M lines/sec on 2xE5-2620v3, 128GB Ram. CPU Consumption is 6 (out of 24) cores on average (spikes up to 18 cores), memory consumption is far from optimal - 60GB of Ram (6x overhead). This performance levels can't be considered ok for sustained load.

With more complex rules, relay performance dramatically decreases (10-20x decrease and 10x more memory consumption). This is subject to investigate and fix.

Performance with tags is mostly untested

Known issues

  • Some internal queues (if you can call it queues) have no limit so malformed or unthrottled input might lead to OOM issues
  • If backend go down, first point in queue will be lost
  • Config format is far from perfect (readability, easy of modification, easy of generation)
  • Unstable config format
  • Delays are untested
  • Might contain memory leaks
  • Have no statistics
  • Have no documentation, except for comments in config file

Acknowledgement

This program was originally developed for Booking.com. With approval from Booking.com, the code was generalised and published as Open Source on GitHub, for which the author would like to express his gratitude.

License

This code is licensed under the Apache2 license.