From 45726b79a3caee978a727a987723fbe1913b58df Mon Sep 17 00:00:00 2001 From: htuch Date: Tue, 12 May 2020 15:50:47 -0400 Subject: [PATCH] docs: FAQ on benchmarking best practices. (#11140) Includes a bunch of tips from @jmarantz, @oschaaf, @mattklein123. Signed-off-by: Harvey Tuch --- .../disable_circuit_breaking.rst | 2 + docs/root/faq/overview.rst | 1 + .../faq/performance/how_fast_is_envoy.rst | 2 + .../performance/how_to_benchmark_envoy.rst | 83 +++++++++++++++++++ 4 files changed, 88 insertions(+) create mode 100644 docs/root/faq/performance/how_to_benchmark_envoy.rst diff --git a/docs/root/faq/load_balancing/disable_circuit_breaking.rst b/docs/root/faq/load_balancing/disable_circuit_breaking.rst index 00182f8d8407..dfc6180628c9 100644 --- a/docs/root/faq/load_balancing/disable_circuit_breaking.rst +++ b/docs/root/faq/load_balancing/disable_circuit_breaking.rst @@ -1,3 +1,5 @@ +.. _faq_disable_circuit_breaking: + Is there a way to disable circuit breaking? =========================================== diff --git a/docs/root/faq/overview.rst b/docs/root/faq/overview.rst index d2f21b70a333..3769b53f4766 100644 --- a/docs/root/faq/overview.rst +++ b/docs/root/faq/overview.rst @@ -34,6 +34,7 @@ Performance :maxdepth: 2 performance/how_fast_is_envoy + performance/how_to_benchmark_envoy Configuration ------------- diff --git a/docs/root/faq/performance/how_fast_is_envoy.rst b/docs/root/faq/performance/how_fast_is_envoy.rst index 78b1dd4d20bc..f2d7ceadaa91 100644 --- a/docs/root/faq/performance/how_fast_is_envoy.rst +++ b/docs/root/faq/performance/how_fast_is_envoy.rst @@ -1,3 +1,5 @@ +.. _faq_how_fast_is_envoy: + How fast is Envoy? ================== diff --git a/docs/root/faq/performance/how_to_benchmark_envoy.rst b/docs/root/faq/performance/how_to_benchmark_envoy.rst new file mode 100644 index 000000000000..4152cf6d2fa3 --- /dev/null +++ b/docs/root/faq/performance/how_to_benchmark_envoy.rst @@ -0,0 +1,83 @@ +What are best practices for benchmarking Envoy? +=============================================== + +There is :ref:`no single QPS, latency or throughput overhead ` that can +characterize a network proxy such as Envoy. Instead, any measurements need to be contextually aware, +ensuring an apples-to-apples comparison with other systems by configuring and load testing Envoy +appropriately. As a result, we can't provide a canonical benchmark configuration, but instead offer +the following guidance: + +* A release Envoy binary should be used. If building, please ensure that `-c opt` + is used on the Bazel command line. When consuming Envoy point releases, make + sure you are using the latest point release; given the pace of Envoy development + it's not reasonable to pick older versions when making a statement about Envoy + performance. Similarly, if working on a master build, please perform due diligence + and ensure no regressions or performance improvements have landed proximal to your + benchmark work and that your are close to HEAD. + +* The :option:`--concurrency` Envoy CLI flag should be unset (providing one worker thread per + logical core on your machine) or set to match the number of cores/threads made available to other + network proxies in your comparison. + +* Disable :ref:`circuit breaking `. A common issue during benchmarking + is that Envoy's default circuit breaker limits are low, leading to connection and request queuing. + +* Disable :ref:`generate_request_id + `. + +* Disable :ref:`dynamic_stats + `. If you are measuring + the overhead vs. a direct connection, you might want to consider disabling all stats via + :ref:`reject_all `. + +* Ensure that the networking and HTTP filter chains are reflective of comparable features + in the systems that Envoy is being compared with. + +* Ensure that TLS settings (if any) are realistic and that consistent cyphers are used in + any comparison. Session reuse may have a significant impact on results and should be tracked via + :ref:`listener SSL stats `. + +* Ensure that :ref:`HTTP/2 settings `, in + particular those that affect flow control and stream concurrency, are consistent in any + comparison. Ideally taking into account BDP and network link latencies when optimizing any + HTTP/2 settings. + +* Verify in the listener and cluster stats that the number of streams, connections and errors + matches what is expected in any given experiment. + +* Make sure you are aware of how connections created by your load generator are + distributed across Envoy worker threads. This is especially important for + benchmarks that use low connection counts and perfect keep-alive. You should be aware that + Envoy will allocate all streams for a given connection to a single worker thread. This means, + for example, that if you have 72 logical cores and worker threads, but only a single HTTP/2 + connection from your load generator, then only 1 worker thread will be active. + +* Make sure request-release timing expectations line up with what is intended. + Some load generators produce naturally jittery and/or batchy timings. This + might end up being an unintended dominant factor in certain tests. + +* The specifics of how your load generator reuses connections is an important factor (e.g. MRU, + random, LRU, etc.) as this impacts work distribution. + +* If you're trying to measure small (say < 1ms) latencies, make sure the measurement tool and + environment have the required sensitivity and the noise floor is sufficiently low. + +* Be critical of your bootstrap or xDS configuration. Ideally every line has a motivation and is + necessary for the benchmark under consideration. + +* Consider using `Nighthawk `_ as your + load generator and measurement tool. We are committed to building out + benchmarking and latency measurement best practices in this tool. + +* Examine `perf` profiles of Envoy during the benchmark run, e.g. with `flame graphs + `_. Verify that Envoy is spending its time + doing the expected essential work under test, rather than some unrelated or tangential + work. + +* Familiarize yourself with `latency measurement best practices + `_. In particular, never measure latency at + max load, this is not generally meaningful or reflecting of real system performance; aim + to measure below the knee of the QPS-latency curve. Prefer open vs. closed loop load + generators. + +* Avoid `benchmarking crimes `_.