From 45726b79a3caee978a727a987723fbe1913b58df Mon Sep 17 00:00:00 2001
From: htuch <htuch@users.noreply.github.com>
Date: Tue, 12 May 2020 15:50:47 -0400
Subject: [PATCH] docs: FAQ on benchmarking best practices. (#11140)

Includes a bunch of tips from @jmarantz, @oschaaf, @mattklein123.

Signed-off-by: Harvey Tuch <htuch@google.com>
---
 .../disable_circuit_breaking.rst              |  2 +
 docs/root/faq/overview.rst                    |  1 +
 .../faq/performance/how_fast_is_envoy.rst     |  2 +
 .../performance/how_to_benchmark_envoy.rst    | 83 +++++++++++++++++++
 4 files changed, 88 insertions(+)
 create mode 100644 docs/root/faq/performance/how_to_benchmark_envoy.rst

diff --git a/docs/root/faq/load_balancing/disable_circuit_breaking.rst b/docs/root/faq/load_balancing/disable_circuit_breaking.rst
index 00182f8d8407..dfc6180628c9 100644
--- a/docs/root/faq/load_balancing/disable_circuit_breaking.rst
+++ b/docs/root/faq/load_balancing/disable_circuit_breaking.rst
@@ -1,3 +1,5 @@
+.. _faq_disable_circuit_breaking:
+
 Is there a way to disable circuit breaking?
 ===========================================
 
diff --git a/docs/root/faq/overview.rst b/docs/root/faq/overview.rst
index d2f21b70a333..3769b53f4766 100644
--- a/docs/root/faq/overview.rst
+++ b/docs/root/faq/overview.rst
@@ -34,6 +34,7 @@ Performance
   :maxdepth: 2
 
   performance/how_fast_is_envoy
+  performance/how_to_benchmark_envoy
 
 Configuration
 -------------
diff --git a/docs/root/faq/performance/how_fast_is_envoy.rst b/docs/root/faq/performance/how_fast_is_envoy.rst
index 78b1dd4d20bc..f2d7ceadaa91 100644
--- a/docs/root/faq/performance/how_fast_is_envoy.rst
+++ b/docs/root/faq/performance/how_fast_is_envoy.rst
@@ -1,3 +1,5 @@
+.. _faq_how_fast_is_envoy:
+
 How fast is Envoy?
 ==================
 
diff --git a/docs/root/faq/performance/how_to_benchmark_envoy.rst b/docs/root/faq/performance/how_to_benchmark_envoy.rst
new file mode 100644
index 000000000000..4152cf6d2fa3
--- /dev/null
+++ b/docs/root/faq/performance/how_to_benchmark_envoy.rst
@@ -0,0 +1,83 @@
+What are best practices for benchmarking Envoy?
+===============================================
+
+There is :ref:`no single QPS, latency or throughput overhead <faq_how_fast_is_envoy>` that can
+characterize a network proxy such as Envoy. Instead, any measurements need to be contextually aware,
+ensuring an apples-to-apples comparison with other systems by configuring and load testing Envoy
+appropriately. As a result, we can't provide a canonical benchmark configuration, but instead offer
+the following guidance:
+
+* A release Envoy binary should be used. If building, please ensure that `-c opt`
+  is used on the Bazel command line. When consuming Envoy point releases, make
+  sure you are using the latest point release; given the pace of Envoy development
+  it's not reasonable to pick older versions when making a statement about Envoy
+  performance. Similarly, if working on a master build, please perform due diligence
+  and ensure no regressions or performance improvements have landed proximal to your
+  benchmark work and that your are close to HEAD.
+
+* The :option:`--concurrency` Envoy CLI flag should be unset (providing one worker thread per
+  logical core on your machine) or set to match the number of cores/threads made available to other
+  network proxies in your comparison.
+
+* Disable :ref:`circuit breaking <faq_disable_circuit_breaking>`. A common issue during benchmarking
+  is that Envoy's default circuit breaker limits are low, leading to connection and request queuing.
+
+* Disable :ref:`generate_request_id
+  <envoy_v3_api_field_extensions.filters.network.http_connection_manager.v3.HttpConnectionManager.generate_request_id>`.
+
+* Disable :ref:`dynamic_stats
+  <envoy_v3_api_field_extensions.filters.http.router.v3.Router.dynamic_stats>`. If you are measuring
+  the overhead vs. a direct connection, you might want to consider disabling all stats via
+  :ref:`reject_all <envoy_v3_api_field_config.metrics.v3.StatsMatcher.reject_all>`.
+
+* Ensure that the networking and HTTP filter chains are reflective of comparable features
+  in the systems that Envoy is being compared with.
+
+* Ensure that TLS settings (if any) are realistic and that consistent cyphers are used in
+  any comparison. Session reuse may have a significant impact on results and should be tracked via
+  :ref:`listener SSL stats <config_listener_stats>`.
+
+* Ensure that :ref:`HTTP/2 settings <envoy_v3_api_msg_config.core.v3.Http2ProtocolOptions>`, in
+  particular those that affect flow control and stream concurrency, are consistent in any
+  comparison. Ideally taking into account BDP and network link latencies when optimizing any
+  HTTP/2 settings.
+
+* Verify in the listener and cluster stats that the number of streams, connections and errors
+  matches what is expected in any given experiment.
+
+* Make sure you are aware of how connections created by your load generator are
+  distributed across Envoy worker threads. This is especially important for
+  benchmarks that use low connection counts and perfect keep-alive. You should be aware that
+  Envoy will allocate all streams for a given connection to a single worker thread. This means,
+  for example, that if you have 72 logical cores and worker threads, but only a single HTTP/2
+  connection from your load generator, then only 1 worker thread will be active.
+
+* Make sure request-release timing expectations line up with what is intended.
+  Some load generators produce naturally jittery and/or batchy timings. This
+  might end up being an unintended dominant factor in certain tests.
+
+* The specifics of how your load generator reuses connections is an important factor (e.g. MRU,
+  random, LRU, etc.) as this impacts work distribution.
+
+* If you're trying to measure small (say < 1ms) latencies, make sure the measurement tool and
+  environment have the required sensitivity and the noise floor is sufficiently low.
+
+* Be critical of your bootstrap or xDS configuration. Ideally every line has a motivation and is
+  necessary for the benchmark under consideration.
+
+* Consider using `Nighthawk <https://github.com/envoyproxy/nighthawk>`_ as your
+  load generator and measurement tool. We are committed to building out
+  benchmarking and latency measurement best practices in this tool.
+
+* Examine `perf` profiles of Envoy during the benchmark run, e.g. with `flame graphs
+  <http://www.brendangregg.com/flamegraphs.html>`_. Verify that Envoy is spending its time
+  doing the expected essential work under test, rather than some unrelated or tangential
+  work.
+
+* Familiarize yourself with `latency measurement best practices
+  <https://www.youtube.com/watch?v=lJ8ydIuPFeU>`_. In particular, never measure latency at
+  max load, this is not generally meaningful or reflecting of real system performance; aim
+  to measure below the knee of the QPS-latency curve. Prefer open vs. closed loop load
+  generators.
+
+* Avoid `benchmarking crimes <https://www.cse.unsw.edu.au/~gernot/benchmarking-crimes.html>`_.