Skip to content

Commit d021c23

Browse files
committed
Add monitoring architecture.
1 parent a474b2b commit d021c23

File tree

2 files changed

+232
-0
lines changed

2 files changed

+232
-0
lines changed
+232
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,232 @@
1+
<!-- BEGIN MUNGE: UNVERSIONED_WARNING -->
2+
3+
<!-- BEGIN STRIP_FOR_RELEASE -->
4+
5+
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
6+
width="25" height="25">
7+
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
8+
width="25" height="25">
9+
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
10+
width="25" height="25">
11+
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
12+
width="25" height="25">
13+
<img src="http://kubernetes.io/kubernetes/img/warning.png" alt="WARNING"
14+
width="25" height="25">
15+
16+
<h2>PLEASE NOTE: This document applies to the HEAD of the source tree</h2>
17+
18+
If you are using a released version of Kubernetes, you should
19+
refer to the docs that go with that version.
20+
21+
Documentation for other releases can be found at
22+
[releases.k8s.io](http://releases.k8s.io).
23+
</strong>
24+
--
25+
26+
<!-- END STRIP_FOR_RELEASE -->
27+
28+
<!-- END MUNGE: UNVERSIONED_WARNING -->
29+
30+
# Kubernetes monitoring architecture
31+
32+
## Executive Summary
33+
34+
Monitoring is split into two pipelines:
35+
36+
* A **core metrics pipeline** consisting of Kubelet, a resource estimator, a slimmed-down
37+
Heapster called metrics-server, and the API server serving the master metrics API. These
38+
metrics are used by core system components, such as scheduling logic (e.g. scheduler and
39+
horizontal pod autoscaling based on system metrics) and simple out-of-the-box UI components
40+
(e.g. `kubectl top`). This pipeline is not intended for integration with third-party
41+
monitoring systems.
42+
* A **monitoring pipeline** used for collecting various metrics from the system and exposing
43+
them to end-users, as well as to the Horizontal Pod Autoscaler (for custom metrics) and Infrastore
44+
via adapters. Users can choose from many monitoring system vendors, or run none at all. In
45+
open-source, Kubernetes will not ship with a monitoring pipeline, but third-party options
46+
will be easy to install. We expect that such pipelines will typically consist of a per-node
47+
agent and a cluster-level aggregator.
48+
49+
The architecture is illustrated in the diagram in the Appendix of this doc.
50+
51+
## Introduction and Objectives
52+
53+
This document proposes a high-level monitoring architecture for Kubernetes. It covers
54+
a subset of the issues mentioned in the “Kubernetes Monitoring Architecture” doc,
55+
specifically focusing on an architecture (components and their interactions) that
56+
hopefully meets the numerous requirements. We do not specify any particular timeframe
57+
for implementing this architecture, nor any particular roadmap for getting there.
58+
59+
### Terminology
60+
61+
There are two types of metrics, system metrics and service metrics. System metrics are
62+
generic metrics that are generally available from every entity that is monitored (e.g.
63+
usage of CPU and memory by container and node). Service metrics are explicitly defined
64+
in application code and exported (e.g. number of 500s served by the API server). Both
65+
system metrics and service metrics can originate from users’ containers or from system
66+
infrastructure components (master components like the API server, addon pods running on
67+
the master, and addon pods running on user nodes).
68+
69+
We divide system metrics into
70+
71+
* *core metrics*, which are metrics that Kubernetes understands and uses for operation
72+
of its internal components and core utilities -- for example, metrics used for scheduling
73+
(including the inputs to the algorithms for resource estimation, initial resources/vertical
74+
autoscaling, cluster autoscaling, and horizontal pod autoscaling excluding custom metrics),
75+
the kube dashboard, and “kubectl top.” As of now this would consist of cpu cumulative usage,
76+
memory instantaneous usage, disk usage of pods, disk usage of containers
77+
* *non-core metrics*, which are not interpreted by Kubernetes; we generally assume they
78+
include the core metrics (though not necessarily in a format Kubernetes understands) plus
79+
additional metrics.
80+
81+
Service metrics can be divided into those produced by Kubernetes infrastructure components
82+
(and thus useful for operation of the Kubernetes cluster) and those produced by user applications.
83+
Service metrics used as input to horizontal pod autoscaling are sometimes called custom metrics.
84+
Of course horizontal pod autoscaling also uses core metrics.
85+
86+
We consider logging to be separate from monitoring, so logging is outside the scope of
87+
this doc.
88+
89+
### Requirements
90+
91+
The monitoring architecture should
92+
93+
* include a solution that is part of core Kubernetes and
94+
* makes core system metrics about nodes, pods, and containers available via a standard
95+
master API (today the master metrics API), such that core Kubernetes features do not
96+
depend on non-core components
97+
* requires Kubelet to only export a limited set of metrics, namely those required for
98+
core Kubernetes components to correctly operate (this is related to #18770)
99+
* can scale up to at least 5000 nodes
100+
* is small enough that we can require that all of its components be running in all deployment
101+
configurations
102+
* include an out-of-the-box solution that can serve historical data, e.g. to support Initial
103+
Resources and vertical pod autoscaling as well as cluster analytics queries, that depends
104+
only on core Kubernetes
105+
* allow for third-party monitoring solutions that are not part of core Kubernetes and can
106+
be integrated with components like Horizontal Pod Autoscaler that require service metrics
107+
108+
## Architecture
109+
110+
We divide our description of the long-term architecture plan into the core metrics pipeline
111+
and the monitoring pipeline. For each, it is necessary to think about how to deal with each
112+
type of metric (core metrics, non-core metrics, and service metrics) from both the master
113+
and minions.
114+
115+
### Core metrics pipeline
116+
117+
The core metrics pipeline collects a set of core system metrics. There are two sources for
118+
these metrics
119+
120+
* Kubelet, providing per-node/pod/container usage information (the current cAdvisor that
121+
is part of Kubelet will be slimmed down to provide only core system metrics)
122+
* a resource estimator that runs as a DaemonSet and turns raw usage values scraped from
123+
Kubelet into resource estimates (values used by scheduler for a more advanced usage-based
124+
scheduler)
125+
126+
These sources are scraped by a component we call *metrics-server* which is like a slimmed-down
127+
version of today's Heapster. metrics-server stores locally only latest values and has no sinks.
128+
metrics-server exposes the master metrics API. (The configuration described here is similar
129+
to the current Heapster in “standalone” mode.)
130+
[Discovery summarizer](../../docs/proposals/federated-api-servers.md)
131+
makes the master metrics API available to external clients such that from the client’s perspective
132+
it looks the same as talking to the API server.
133+
134+
Core (system) metrics are handled as described above in all deployment environments. The only
135+
easily replaceable part is resource estimator, which could be replaced by power users. In
136+
theory, metric-server itself can also be substituted, but it’d be similar to substituting
137+
apiserver itself or controller-manager - possible, but not recommended and not supported.
138+
139+
Eventually the core metrics pipeline might also collect metrics from Kubelet and Docker daemon
140+
themselves (e.g. CPU usage of Kubelet), even though they do not run in containers.
141+
142+
The core metrics pipeline is intentionally small and not designed for third-party integrations.
143+
“Full-fledged” monitoring is left to third-party systems, which provide the monitoring pipeline
144+
(see next section) and can run on Kubernetes without having to make changes to upstream components.
145+
In this way we can remove the burden we have today that comes with maintaining Heapster as the
146+
integration point for every possible metrics source, sink, and feature.
147+
148+
#### Infrastore
149+
150+
We will build an open-source Infrastore component (most likely reusing existing technologies)
151+
for serving historical queries over core system metrics and events, which it will fetch from
152+
the master APIs. Infrastore will expose one or more APIs (possibly just SQL-like queries --
153+
this is TBD) to handle the following use cases
154+
155+
* initial resources
156+
* vertical autoscaling
157+
* oldtimer API
158+
* decision-support queries for debugging, capacity planning, etc.
159+
* usage graphs in the [Kubernetes Dashboard](https://github.com/kubernetes/dashboard)
160+
161+
In addition, it may collect monitoring metrics and service metrics (at least from Kubernetes
162+
infrastructure containers), described in the upcoming sections.
163+
164+
### Monitoring pipeline
165+
166+
One of the goals of building a dedicated metrics pipeline for core metrics, as described in the
167+
previous section, is to allow for a separate monitoring pipeline that can be very flexible
168+
because core Kubernetes components do not need to rely on it. By default we will not provide
169+
one, but we will provide an easy way to install one (using a single command, most likely using
170+
Helm). We described the monitoring pipeline in this section.
171+
172+
Data collected by the monitoring pipeline may contain any sub- or superset of the following groups
173+
of metrics:
174+
175+
* core system metrics
176+
* non-core system metrics
177+
* service metrics from user application containers
178+
* service metrics from Kubernetes infrastructure containers; these metrics are exposed using
179+
Prometheus instrumentation
180+
181+
It is up to the monitoring solution to decide which of these are collected.
182+
183+
In order to enable horizontal pod autoscaling based on custom metrics, the provider of the
184+
monitoring pipeline would also have to create a stateless API adapter that pulls the custom
185+
metrics from the monitoring pipeline and exposes them to the Horizontal Pod Autoscaler. Such
186+
API will be a well defined, versioned API similar to regular APIs. Details of how it will be
187+
exposed or discovered will be covered in a detailed design doc for this component.
188+
189+
The same approach applies if it is desired to make monitoring pipeline metrics available in
190+
Infrastore. These adapters could be standalone components, libraries, or part of the monitoring
191+
solution itself.
192+
193+
There are many possible combinations of node and cluster-level agents that could comprise a
194+
monitoring pipeline, including
195+
cAdvisor + Heapster + InfluxDB (or any other sink)
196+
* cAdvisor + collectd + Heapster
197+
* cAdvisor + Prometheus
198+
* snapd + Heapster
199+
* snapd + SNAP cluster-level agent
200+
* Sysdig
201+
202+
As an example we’ll describe a potential integration with cAdvisor + Prometheus.
203+
204+
Prometheus has the following metric sources on a node:
205+
* core and non-core system metrics from cAdvisor
206+
* service metrics exposed by containers via HTTP handler in Prometheus format
207+
* [optional] metrics about node itself from Node Exporter (a Prometheus component)
208+
209+
All of them are polled by the Prometheus cluster-level agent. We can use the Prometheus
210+
cluster-level agent as a source for horizontal pod autoscaling custom metrics by using a
211+
standalone API adapter that proxies/translates between the Prometheus Query Language endpoint
212+
on the Prometheus cluster-level agent and an HPA-specific API. Likewise an adapter can be
213+
used to make the metrics from the monitoring pipeline available in Infrastore. Neither
214+
adapter is necessary if the user does not need the corresponding feature.
215+
216+
The command that installs cAdvisor+Prometheus should also automatically set up collection
217+
of the metrics from infrastructure containers. This is possible because the names of the
218+
infrastructure containers and metrics of interest are part of the Kubernetes control plane
219+
configuration itself, and because the infrastructure containers export their metrics in
220+
Prometheus format.
221+
222+
## Appendix: Architecture diagram
223+
224+
### Open-source monitoring pipeline
225+
226+
![Architecture Diagram](monitoring_architecture.png?raw=true "Architecture overview")
227+
228+
229+
230+
<!-- BEGIN MUNGE: GENERATED_ANALYTICS -->
231+
[![Analytics](https://kubernetes-site.appspot.com/UA-36037335-10/GitHub/docs/design/monitoring_architecture.md?pixel)]()
232+
<!-- END MUNGE: GENERATED_ANALYTICS -->
74.9 KB
Loading

0 commit comments

Comments
 (0)