Terraform module for a GKE Kubernetes Cluster in GCP
If you want to utilize this feature make sure to declare a helm
provider in your terraform configuration as follows.
provider "helm" {
version = "2.1.2" # see https://github.com/terraform-providers/terraform-provider-helm/releases
kubernetes {
host = module.gke_cluster.cluster_endpoint
token = data.google_client_config.google_client.access_token
cluster_ca_certificate = module.gke_cluster.cluster_ca_certificate
}
}
Pay attention to the gke_cluster
module output variables used here.
If you are using the namespace
variable, you may get an error like the following:
Error: Get "http://localhost/api/v1/namespaces/<namespace_name>": dial tcp 127.0.0.1:80: connect: connection refused
In order to fix this, you need to declare a kubernetes
provider in your terraform configuration like the following.
provider "kubernetes" {
version = "1.13.3" # see https://github.com/terraform-providers/terraform-provider-kubernetes/releases
load_config_file = false
host = module.gke_cluster.cluster_endpoint
token = data.google_client_config.google_client.access_token
cluster_ca_certificate = module.gke_cluster.cluster_ca_certificate
}
data "google_client_config" "google_client" {}
Pay attention to the gke_cluster
module output variables used here.
Drop the use of attributes such as node_count_initial_per_zone
and/or node_count_current_per_zone
(if any) from the list of objects in var.node_pools
.
This upgrade performs 2 changes:
- Move the declaration of kubernetes secrets into the declaration of kubernets namesapces
- see the Pull Request description at airasia#7
- Ability to create multiple ingress IPs for istio
- read below
Detailed steps provided below:
- Upgrade
gke_cluster
module version to2.7.1
- Run
terraform plan
- DO NOT APPLY this plan- the plan may show that some
istio
resource(s) (if used any) will be destroyed - we want to avoid any kind of destruction and/or recreation
- P.S. to resolve any changes proposed for
kubernetes_secret
resource(s), please refer to this Pull Request description instead
- the plan may show that some
- Set the
istio_ip_names
variable with at least one item as["ip"]
- this is so that the istio IP resource name is backward-compaitble
- Run
terraform plan
- DO NOT APPLY this plan- now, the plan may show that a
static_istio_ip
resource (if used any) will be destroyed and recreated under new named index - we want to avoid any kind of destruction and/or recreation
- P.S. to resolve any changes proposed for
kubernetes_secret
resource(s), please refer to this Pull Request description instead
- now, the plan may show that a
- Move the terraform states
- notice that the plan says your existing static_istio_ip resource (let's say
istioIpX
) will be destroyed and new static_istio_ip resource (let's sayistioIpY
) will be created - pay attention to the array indexes:
- the
*X
resources (the ones to be destroyed) start with array index[0]
- although it may not show[0]
in the displayed plan - the
*Y
resources (the ones to be created) will show array index with new named index
- the
- Use
terraform state mv
to manually move the state ofistioIpX
toistioIpY
- refer to https://www.terraform.io/docs/commands/state/mv.html to learn more about how to move Terraform state positions
- once a resource is moved, it will say
Successfully moved 1 object(s).
- notice that the plan says your existing static_istio_ip resource (let's say
- Run
terraform plan
again- the plan should now show that no changes required
- this confirms that you have successfully moved all your resources' states to their new position as required by
v2.7.1
.
- DONE
- Upgrade
gke_cluster
module version to2.5.1
- Run
terraform plan
- DO NOT APPLY this plan- the plan will show that several resources will be destroyed and recreated under new named indexes
- we want to avoid any kind of destruction and/or recreation
- Move the terraform states
- notice that the plan says your existing static_ingress_ip resource(s) (let's say
ingressIpX
) will be destroyed and new static_ingress_ip resource(s) (let's sayingressIpY
) will be created - also notice that the plan says your existing kubernetes_namespace resource(s) (let's say
namespaceX
) will be destroyed and new kubernetes_namespace resource(s) (let's saynamespaceY
) will be created - P.S. if you happen to have multiple static_ingress_ip resource(s) and kubernetes_namespace resource(s), then the plan will show these destructions and recreations multiple times. You will need to move the states for EACH of the respective resources one-by-one.
- pay attention to the array indexes:
- the
*X
resources (the ones to be destroyed) start with array index[0]
- although it may not show[0]
in the displayed plan - the
*Y
resources (the ones to be created) will show array indexes with new named indexes
- the
- Use
terraform state mv
to manually move the states of each ofingressIpX
toingressIpY
, and to move the states of each ofnamespaceX
tonamespaceY
- refer to https://www.terraform.io/docs/commands/state/mv.html to learn more about how to move Terraform state positions
- once a resource is moved, it will say
Successfully moved 1 object(s).
- repeat until all relevant states are moved to their desired positions
- notice that the plan says your existing static_ingress_ip resource(s) (let's say
- Run
terraform plan
again- the plan should now show that no changes required
- this confirms that you have successfully moved all your resources' states to their new position as required by
v2.5.1
.
- DONE
This upgrade process will:
- drop the use of auxiliary node pools (if any)
- create a new node pool under terraform's array structure
- migrate eixsting deployments/workloads from old node pool to new node pool
- delete old standalone node pool as it's no longer required
Detailed steps provided below:
- While on
v2.2.2
, remove the variablescreate_auxiliary_node_pool
andauxiliary_node_pool_config
.- run
terraform plan
&terraform apply
- this will remove any
auxiliary_node_pool
that may have been there
- run
- Upgrade gke_cluster module to
v2.3.1
and set variablenode_pools
with its required params.- value of
node_pool_name
for the new node pool must be different from the name of the old node pool - run
terraform plan
&terraform apply
- this will create a new node pool as per the specs provided in
node_pools
.
- value of
- Migrate existing deployments/workloads from old node pool to new node pool.
- check status of nodes
kubectl get nodes
- confirm that all nodes from all node pools are shown
- confirm that all nodes have status
Ready
- check status of pods
kubectl get pods -o=wide
- confirm that all pods have status
Running
- confirm that all pods are running on nodes from the old node pool
- cordon the old node pool
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=<OLD_NODE_POOL_NAME> -o=name); do kubectl cordon "$node"; done
- replace <OLD_NODE_POOL_NAME> with the correct value- check status of nodes
kubectl get nodes
- confirm that all nodes from the old node pools have status
Ready,SchedulingDisabled
- confirm that all nodes from the new node pools have status
Ready
- check status of pods
kubectl get pods -o=wide
- confirm that all pods still have status
Running
- confirm that all pods are still running on nodes from the old node pool
- initiate rolling restart of all deployments
kubectl rollout restart deployment <DEPLOYMENT_1_NAME> <DEPLOYMENT_2_NAME> <DEPLOYMENT_3_NAME>
- replace <DEPLOYMENT_*_NAME> with correct names of existing deployments- check status of pods
kubectl get pods -o=wide
- confirm that some pods have status
Running
while some new pods have statusContainerCreating
- confirm that the new pods with status
ContainerCreating
are running on nodes from the new node pool - repeat status checks until all pods have status
Running
and all pods are running on nodes from the new node pool only
- drain the old node pool
for node in $(kubectl get nodes -l cloud.google.com/gke-nodepool=<OLD_NODE_POOL_NAME> -o=name); do kubectl drain --force --ignore-daemonsets --delete-local-data --grace-period=10 "$node"; done
- replace <OLD_NODE_POOL_NAME> with the correct value- confirm that the response says
evicting pod
orevicted
for all remaining pods in the old node pool - this step may take some time
- Migration complete
- check status of nodes
- Upgrade gke_cluster module to
v2.4.2
and remove use of any obsolete variables.- remove standalone variables such as
machine_type
,disk_size_gb
,node_count_initial_per_zone
,node_count_min_per_zone
,node_count_max_per_zone
,node_count_current_per_zone
from the module which are no longer used for standalone node pool. - run
terraform plan
&terraform apply
- this will remove the old node pool completely
- remove standalone variables such as
- DONE
- While at
v1.2.9
, setcreate_auxiliary_node_pool
toTrue
- this will create a new additional node pool according to the values ofvar.auxiliary_node_pool_config
before proceeding with the breaking change.- Run
terraform apply
- Run
- Migrate all workloads from existing node pool to the newly created auxiliary node pool
- Follow these instructions
- Upgrade
gke_cluster
module tov1.3.0
- this will destroy and recreate the GKE node pool whiile the auxiliary node pool from step 1 will continue to serve requests of GKE cluster- Run
terraform apply
- Run
- Migrate all workloads back from the auxiliary node pool to the newly created node pool
- Follow these instructions
- While at
v1.3.0
, setcreate_auxiliary_node_pool
toFalse
- this will destroy the auxiliary node pool that was created in step 1 as it is no longer needed now- Run
terraform apply
- Run
- Done