|
| 1 | +<!-- BEGIN MUNGE: UNVERSIONED_WARNING --> |
| 2 | + |
| 3 | +<!-- BEGIN STRIP_FOR_RELEASE --> |
| 4 | + |
| 5 | +<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING" |
| 6 | + width="25" height="25"> |
| 7 | +<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING" |
| 8 | + width="25" height="25"> |
| 9 | +<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING" |
| 10 | + width="25" height="25"> |
| 11 | +<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING" |
| 12 | + width="25" height="25"> |
| 13 | +<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING" |
| 14 | + width="25" height="25"> |
| 15 | + |
| 16 | +<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2> |
| 17 | + |
| 18 | +If you are using a released version of Kubernetes, you should |
| 19 | +refer to the docs that go with that version. |
| 20 | + |
| 21 | +Documentation for other releases can be found at |
| 22 | +[releases.k8s.io](http://releases.k8s.io). |
| 23 | +</strong> |
| 24 | +-- |
| 25 | + |
| 26 | +<!-- END STRIP_FOR_RELEASE --> |
| 27 | + |
| 28 | +<!-- END MUNGE: UNVERSIONED_WARNING --> |
| 29 | + |
| 30 | +# Kubernetes monitoring architecture |
| 31 | + |
| 32 | +## Executive Summary |
| 33 | + |
| 34 | +Monitoring is split into two pipelines: |
| 35 | + |
| 36 | +* A **core metrics pipeline** consisting of Kubelet, a resource estimator, a slimmed-down |
| 37 | +Heapster called metrics-server, and the API server serving the master metrics API. These |
| 38 | +metrics are used by core system components, such as scheduling logic (e.g. scheduler and |
| 39 | +horizontal pod autoscaling based on system metrics) and simple out-of-the-box UI components |
| 40 | +(e.g. `kubectl top`). This pipeline is not intended for integration with third-party |
| 41 | +monitoring systems. |
| 42 | +* A **monitoring pipeline** used for collecting various metrics from the system and exposing |
| 43 | +them to end-users, as well as to the Horizontal Pod Autoscaler (for custom metrics) and Infrastore |
| 44 | +via adapters. Users can choose from many monitoring system vendors, or run none at all. In |
| 45 | +open-source, Kubernetes will not ship with a monitoring pipeline, but third-party options |
| 46 | +will be easy to install. We expect that such pipelines will typically consist of a per-node |
| 47 | +agent and a cluster-level aggregator. |
| 48 | + |
| 49 | +The architecture is illustrated in the diagram in the Appendix of this doc. |
| 50 | + |
| 51 | +## Introduction and Objectives |
| 52 | + |
| 53 | +This document proposes a high-level monitoring architecture for Kubernetes. It covers |
| 54 | +a subset of the issues mentioned in the “Kubernetes Monitoring Architecture” doc, |
| 55 | +specifically focusing on an architecture (components and their interactions) that |
| 56 | +hopefully meets the numerous requirements. We do not specify any particular timeframe |
| 57 | +for implementing this architecture, nor any particular roadmap for getting there. |
| 58 | + |
| 59 | +### Terminology |
| 60 | + |
| 61 | +There are two types of metrics, system metrics and service metrics. System metrics are |
| 62 | +generic metrics that are generally available from every entity that is monitored (e.g. |
| 63 | +usage of CPU and memory by container and node). Service metrics are explicitly defined |
| 64 | +in application code and exported (e.g. number of 500s served by the API server). Both |
| 65 | +system metrics and service metrics can originate from users’ containers or from system |
| 66 | +infrastructure components (master components like the API server, addon pods running on |
| 67 | +the master, and addon pods running on user nodes). |
| 68 | + |
| 69 | +We divide system metrics into |
| 70 | + |
| 71 | +* *core metrics*, which are metrics that Kubernetes understands and uses for operation |
| 72 | +of its internal components and core utilities -- for example, metrics used for scheduling |
| 73 | +(including the inputs to the algorithms for resource estimation, initial resources/vertical |
| 74 | +autoscaling, cluster autoscaling, and horizontal pod autoscaling excluding custom metrics), |
| 75 | +the kube dashboard, and “kubectl top.” As of now this would consist of cpu cumulative usage, |
| 76 | +memory instantaneous usage, disk usage of pods, disk usage of containers |
| 77 | +* *non-core metrics*, which are not interpreted by Kubernetes; we generally assume they |
| 78 | +include the core metrics (though not necessarily in a format Kubernetes understands) plus |
| 79 | +additional metrics. |
| 80 | + |
| 81 | +Service metrics can be divided into those produced by Kubernetes infrastructure components |
| 82 | +(and thus useful for operation of the Kubernetes cluster) and those produced by user applications. |
| 83 | +Service metrics used as input to horizontal pod autoscaling are sometimes called custom metrics. |
| 84 | +Of course horizontal pod autoscaling also uses core metrics. |
| 85 | + |
| 86 | +We consider logging to be separate from monitoring, so logging is outside the scope of |
| 87 | +this doc. |
| 88 | + |
| 89 | +### Requirements |
| 90 | + |
| 91 | +The monitoring architecture should |
| 92 | + |
| 93 | +* include a solution that is part of core Kubernetes and |
| 94 | + * makes core system metrics about nodes, pods, and containers available via a standard |
| 95 | + master API (today the master metrics API), such that core Kubernetes features do not |
| 96 | + depend on non-core components |
| 97 | + * requires Kubelet to only export a limited set of metrics, namely those required for |
| 98 | + core Kubernetes components to correctly operate (this is related to #18770) |
| 99 | + * can scale up to at least 5000 nodes |
| 100 | + * is small enough that we can require that all of its components be running in all deployment |
| 101 | + configurations |
| 102 | +* include an out-of-the-box solution that can serve historical data, e.g. to support Initial |
| 103 | +Resources and vertical pod autoscaling as well as cluster analytics queries, that depends |
| 104 | +only on core Kubernetes |
| 105 | +* allow for third-party monitoring solutions that are not part of core Kubernetes and can |
| 106 | +be integrated with components like Horizontal Pod Autoscaler that require service metrics |
| 107 | + |
| 108 | +## Architecture |
| 109 | + |
| 110 | +We divide our description of the long-term architecture plan into the core metrics pipeline |
| 111 | +and the monitoring pipeline. For each, it is necessary to think about how to deal with each |
| 112 | +type of metric (core metrics, non-core metrics, and service metrics) from both the master |
| 113 | +and minions. |
| 114 | + |
| 115 | +### Core metrics pipeline |
| 116 | + |
| 117 | +The core metrics pipeline collects a set of core system metrics. There are two sources for |
| 118 | +these metrics |
| 119 | + |
| 120 | +* Kubelet, providing per-node/pod/container usage information (the current cAdvisor that |
| 121 | +is part of Kubelet will be slimmed down to provide only core system metrics) |
| 122 | +* a resource estimator that runs as a DaemonSet and turns raw usage values scraped from |
| 123 | +Kubelet into resource estimates (values used by scheduler for a more advanced usage-based |
| 124 | +scheduler) |
| 125 | + |
| 126 | +These sources are scraped by a component we call *metrics-server* which is like a slimmed-down |
| 127 | +version of today's Heapster. metrics-server stores locally only latest values and has no sinks. |
| 128 | +metrics-server exposes the master metrics API. (The configuration described here is similar |
| 129 | +to the current Heapster in “standalone” mode.) |
| 130 | +[Discovery summarizer](../../docs/proposals/federated-api-servers.md) |
| 131 | +makes the master metrics API available to external clients such that from the client’s perspective |
| 132 | +it looks the same as talking to the API server. |
| 133 | + |
| 134 | +Core (system) metrics are handled as described above in all deployment environments. The only |
| 135 | +easily replaceable part is resource estimator, which could be replaced by power users. In |
| 136 | +theory, metric-server itself can also be substituted, but it’d be similar to substituting |
| 137 | +apiserver itself or controller-manager - possible, but not recommended and not supported. |
| 138 | + |
| 139 | +Eventually the core metrics pipeline might also collect metrics from Kubelet and Docker daemon |
| 140 | +themselves (e.g. CPU usage of Kubelet), even though they do not run in containers. |
| 141 | + |
| 142 | +The core metrics pipeline is intentionally small and not designed for third-party integrations. |
| 143 | +“Full-fledged” monitoring is left to third-party systems, which provide the monitoring pipeline |
| 144 | +(see next section) and can run on Kubernetes without having to make changes to upstream components. |
| 145 | +In this way we can remove the burden we have today that comes with maintaining Heapster as the |
| 146 | +integration point for every possible metrics source, sink, and feature. |
| 147 | + |
| 148 | +#### Infrastore |
| 149 | + |
| 150 | +We will build an open-source Infrastore component (most likely reusing existing technologies) |
| 151 | +for serving historical queries over core system metrics and events, which it will fetch from |
| 152 | +the master APIs. Infrastore will expose one or more APIs (possibly just SQL-like queries -- |
| 153 | +this is TBD) to handle the following use cases |
| 154 | + |
| 155 | +* initial resources |
| 156 | +* vertical autoscaling |
| 157 | +* oldtimer API |
| 158 | +* decision-support queries for debugging, capacity planning, etc. |
| 159 | +* usage graphs in the [Kubernetes Dashboard](https://github.com/kubernetes/dashboard) |
| 160 | + |
| 161 | +In addition, it may collect monitoring metrics and service metrics (at least from Kubernetes |
| 162 | +infrastructure containers), described in the upcoming sections. |
| 163 | + |
| 164 | +### Monitoring pipeline |
| 165 | + |
| 166 | +One of the goals of building a dedicated metrics pipeline for core metrics, as described in the |
| 167 | +previous section, is to allow for a separate monitoring pipeline that can be very flexible |
| 168 | +because core Kubernetes components do not need to rely on it. By default we will not provide |
| 169 | +one, but we will provide an easy way to install one (using a single command, most likely using |
| 170 | +Helm). We described the monitoring pipeline in this section. |
| 171 | + |
| 172 | +Data collected by the monitoring pipeline may contain any sub- or superset of the following groups |
| 173 | +of metrics: |
| 174 | + |
| 175 | +* core system metrics |
| 176 | +* non-core system metrics |
| 177 | +* service metrics from user application containers |
| 178 | +* service metrics from Kubernetes infrastructure containers; these metrics are exposed using |
| 179 | +Prometheus instrumentation |
| 180 | + |
| 181 | +It is up to the monitoring solution to decide which of these are collected. |
| 182 | + |
| 183 | +In order to enable horizontal pod autoscaling based on custom metrics, the provider of the |
| 184 | +monitoring pipeline would also have to create a stateless API adapter that pulls the custom |
| 185 | +metrics from the monitoring pipeline and exposes them to the Horizontal Pod Autoscaler. Such |
| 186 | +API will be a well defined, versioned API similar to regular APIs. Details of how it will be |
| 187 | +exposed or discovered will be covered in a detailed design doc for this component. |
| 188 | + |
| 189 | +The same approach applies if it is desired to make monitoring pipeline metrics available in |
| 190 | +Infrastore. These adapters could be standalone components, libraries, or part of the monitoring |
| 191 | +solution itself. |
| 192 | + |
| 193 | +There are many possible combinations of node and cluster-level agents that could comprise a |
| 194 | +monitoring pipeline, including |
| 195 | +cAdvisor + Heapster + InfluxDB (or any other sink) |
| 196 | +* cAdvisor + collectd + Heapster |
| 197 | +* cAdvisor + Prometheus |
| 198 | +* snapd + Heapster |
| 199 | +* snapd + SNAP cluster-level agent |
| 200 | +* Sysdig |
| 201 | + |
| 202 | +As an example we’ll describe a potential integration with cAdvisor + Prometheus. |
| 203 | + |
| 204 | +Prometheus has the following metric sources on a node: |
| 205 | +* core and non-core system metrics from cAdvisor |
| 206 | +* service metrics exposed by containers via HTTP handler in Prometheus format |
| 207 | +* [optional] metrics about node itself from Node Exporter (a Prometheus component) |
| 208 | + |
| 209 | +All of them are polled by the Prometheus cluster-level agent. We can use the Prometheus |
| 210 | +cluster-level agent as a source for horizontal pod autoscaling custom metrics by using a |
| 211 | +standalone API adapter that proxies/translates between the Prometheus Query Language endpoint |
| 212 | +on the Prometheus cluster-level agent and an HPA-specific API. Likewise an adapter can be |
| 213 | +used to make the metrics from the monitoring pipeline available in Infrastore. Neither |
| 214 | +adapter is necessary if the user does not need the corresponding feature. |
| 215 | + |
| 216 | +The command that installs cAdvisor+Prometheus should also automatically set up collection |
| 217 | +of the metrics from infrastructure containers. This is possible because the names of the |
| 218 | +infrastructure containers and metrics of interest are part of the Kubernetes control plane |
| 219 | +configuration itself, and because the infrastructure containers export their metrics in |
| 220 | +Prometheus format. |
| 221 | + |
| 222 | +## Appendix: Architecture diagram |
| 223 | + |
| 224 | +### Open-source monitoring pipeline |
| 225 | + |
| 226 | + |
| 227 | + |
| 228 | + |
| 229 | + |
| 230 | +<!-- BEGIN MUNGE: GENERATED_ANALYTICS --> |
| 231 | +[]() |
| 232 | +<!-- END MUNGE: GENERATED_ANALYTICS --> |
0 commit comments