You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/admin/cluster-large.md
+3
Original file line number
Diff line number
Diff line change
@@ -30,9 +30,11 @@ Documentation for other releases can be found at
30
30
<!-- END STRIP_FOR_RELEASE -->
31
31
32
32
<!-- END MUNGE: UNVERSIONED_WARNING -->
33
+
33
34
# Kubernetes Large Cluster
34
35
35
36
## Support
37
+
36
38
At v1.0, Kubernetes supports clusters up to 100 nodes with 30 pods per node and 1-2 container per pod (as defined in the [1.0 roadmap](../../docs/roadmap.md#reliability-and-performance)).
37
39
38
40
## Setup
@@ -59,6 +61,7 @@ To avoid running into cloud provider quota issues, when creating a cluster with
59
61
* Gating the setup script so that it brings up new node VMs in smaller batches with waits in between, because some cloud providers rate limit the creation of VMs.
60
62
61
63
### Addon Resources
64
+
62
65
To prevent memory leaks or other resource issues in [cluster addons](../../cluster/addons/) from consuming all the resources available on a node, Kubernetes sets resource limits on addon containers to limit the CPU and Memory resources they can consume (See PR [#10653](https://github.com/GoogleCloudPlatform/kubernetes/pull/10653/files) and [#10778](https://github.com/GoogleCloudPlatform/kubernetes/pull/10778/files)).
Copy file name to clipboardExpand all lines: docs/admin/high-availability.md
+17-1
Original file line number
Diff line number
Diff line change
@@ -30,6 +30,7 @@ Documentation for other releases can be found at
30
30
<!-- END STRIP_FOR_RELEASE -->
31
31
32
32
<!-- END MUNGE: UNVERSIONED_WARNING -->
33
+
33
34
# High Availability Kubernetes Clusters
34
35
35
36
**Table of Contents**
@@ -43,6 +44,7 @@ Documentation for other releases can be found at
43
44
<!-- END MUNGE: GENERATED_TOC -->
44
45
45
46
## Introduction
47
+
46
48
This document describes how to build a high-availability (HA) Kubernetes cluster. This is a fairly advanced topic.
47
49
Users who merely want to experiment with Kubernetes are encouraged to use configurations that are simpler to set up such as
48
50
the simple [Docker based single node cluster instructions](../../docs/getting-started-guides/docker.md),
@@ -52,6 +54,7 @@ Also, at this time high availability support for Kubernetes is not continuously
52
54
be working to add this continuous testing, but for now the single-node master installations are more heavily tested.
53
55
54
56
## Overview
57
+
55
58
Setting up a truly reliable, highly available distributed system requires a number of steps, it is akin to
56
59
wearing underwear, pants, a belt, suspenders, another pair of underwear, and another pair of pants. We go into each
57
60
of these steps in detail, but a summary is given here to help guide and orient the user.
@@ -68,6 +71,7 @@ Here's what the system should look like when it's finished:
68
71
Ready? Let's get started.
69
72
70
73
## Initial set-up
74
+
71
75
The remainder of this guide assumes that you are setting up a 3-node clustered master, where each machine is running some flavor of Linux.
72
76
Examples in the guide are given for Debian distributions, but they should be easily adaptable to other distributions.
73
77
Likewise, this set up should work whether you are running in a public or private cloud provider, or if you are running
@@ -78,6 +82,7 @@ instructions at [https://get.k8s.io](https://get.k8s.io)
78
82
describe easy installation for single-master clusters on a variety of platforms.
79
83
80
84
## Reliable nodes
85
+
81
86
On each master node, we are going to run a number of processes that implement the Kubernetes API. The first step in making these reliable is
82
87
to make sure that each automatically restarts when it fails. To achieve this, we need to install a process watcher. We choose to use
83
88
the ```kubelet``` that we run on each of the worker nodes. This is convenient, since we can use containers to distribute our binaries, we can
@@ -98,6 +103,7 @@ On systemd systems you ```systemctl enable kubelet``` and ```systemctl enable do
98
103
99
104
100
105
## Establishing a redundant, reliable data storage layer
106
+
101
107
The central foundation of a highly available solution is a redundant, reliable storage layer. The number one rule of high-availability is
102
108
to protect the data. Whatever else happens, whatever catches on fire, if you have the data, you can rebuild. If you lose the data, you're
103
109
done.
@@ -109,6 +115,7 @@ size of the cluster from three to five nodes. If that is still insufficient, yo
109
115
[even more redundancy to your storage layer](#even-more-reliable-storage).
110
116
111
117
### Clustering etcd
118
+
112
119
The full details of clustering etcd are beyond the scope of this document, lots of details are given on the
113
120
[etcd clustering page](https://github.com/coreos/etcd/blob/master/Documentation/clustering.md). This example walks through
114
121
a simple cluster set up, using etcd's built in discovery to build our cluster.
@@ -130,6 +137,7 @@ for ```${NODE_IP}``` on each machine.
130
137
131
138
132
139
#### Validating your cluster
140
+
133
141
Once you copy this into all three nodes, you should have a clustered etcd set up. You can validate with
134
142
135
143
```
@@ -146,6 +154,7 @@ You can also validate that this is working with ```etcdctl set foo bar``` on one
146
154
on a different node.
147
155
148
156
### Even more reliable storage
157
+
149
158
Of course, if you are interested in increased data reliability, there are further options which makes the place where etcd
150
159
installs it's data even more reliable than regular disks (belts *and* suspenders, ftw!).
151
160
@@ -162,9 +171,11 @@ for each node. Throughout these instructions, we assume that this storage is mo
162
171
163
172
164
173
## Replicated API Servers
174
+
165
175
Once you have replicated etcd set up correctly, we will also install the apiserver using the kubelet.
166
176
167
177
### Installing configuration files
178
+
168
179
First you need to create the initial log file, so that Docker mounts a file instead of a directory:
169
180
170
181
```
@@ -183,12 +194,14 @@ Next, you need to create a ```/srv/kubernetes/``` directory on each node. This
183
194
The easiest way to create this directory, may be to copy it from the master node of a working cluster, or you can manually generate these files yourself.
184
195
185
196
### Starting the API Server
197
+
186
198
Once these files exist, copy the [kube-apiserver.yaml](high-availability/kube-apiserver.yaml) into ```/etc/kubernetes/manifests/``` on each master node.
187
199
188
200
The kubelet monitors this directory, and will automatically create an instance of the ```kube-apiserver``` container using the pod definition specified
189
201
in the file.
190
202
191
203
### Load balancing
204
+
192
205
At this point, you should have 3 apiservers all working correctly. If you set up a network load balancer, you should
193
206
be able to access your cluster via that load balancer, and see traffic balancing between the apiserver instances. Setting
194
207
up a load balancer will depend on the specifics of your platform, for example instructions for the Google Cloud
@@ -203,6 +216,7 @@ For external users of the API (e.g. the ```kubectl``` command line interface, co
203
216
them to talk to the external load balancer's IP address.
204
217
205
218
## Master elected components
219
+
206
220
So far we have set up state storage, and we have set up the API server, but we haven't run anything that actually modifies
207
221
cluster state, such as the controller manager and scheduler. To achieve this reliably, we only want to have one actor modifying state at a time, but we want replicated
208
222
instances of these actors, in case a machine dies. To achieve this, we are going to use a lease-lock in etcd to perform
@@ -226,6 +240,7 @@ by copying [kube-scheduler.yaml](high-availability/kube-scheduler.yaml) and [kub
226
240
directory.
227
241
228
242
### Running the podmaster
243
+
229
244
Now that the configuration files are in place, copy the [podmaster.yaml](high-availability/podmaster.yaml) config file into ```/etc/kubernetes/manifests/```
230
245
231
246
As before, the kubelet on the node monitors this directory, and will start an instance of the podmaster using the pod specification provided in ```podmaster.yaml```.
@@ -236,6 +251,7 @@ the kubelet will restart them. If any of these nodes fail, the process will mov
236
251
node.
237
252
238
253
## Conclusion
254
+
239
255
At this point, you are done (yeah!) with the master components, but you still need to add worker nodes (boo!).
240
256
241
257
If you have an existing cluster, this is as simple as reconfiguring your kubelets to talk to the load-balanced endpoint, and
@@ -244,7 +260,7 @@ restarting the kubelets on each node.
244
260
If you are turning up a fresh cluster, you will need to install the kubelet and kube-proxy on each worker node, and
245
261
set the ```--apiserver``` flag to your replicated endpoint.
246
262
247
-
##Vagrant up!
263
+
##Vagrant up!
248
264
249
265
We indeed have an initial proof of concept tester for this, which is available [here](../../examples/high-availability/).
0 commit comments