Skip to content

Commit

Permalink
Adding EKS Cluster support & updated documentation (litmuschaos#22)
Browse files Browse the repository at this point in the history
* Adding AWS configuration

Signed-off-by: Jonsy13 <[email protected]>

* Adding AWS configuration

Signed-off-by: Jonsy13 <[email protected]>

* Adding AWS configuration

Signed-off-by: Jonsy13 <[email protected]>

* Adding AWS configuration

Signed-off-by: Jonsy13 <[email protected]>
  • Loading branch information
Jonsy13 authored Nov 30, 2021
1 parent c3be547 commit 22461ed
Show file tree
Hide file tree
Showing 17 changed files with 258 additions and 162 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ jobs:
- name: Deploy a sample application for chaos injection
run: |
kubectl apply -f https://raw.githubusercontent.com/litmuschaos/chaos-ci-lib/master/app/nginx.yml
kubectl wait --for=condition=Ready pods --all --namespace default --timeout=60s
kubectl wait --for=condition=Ready pods --all --namespace default --timeout=120s
- name: Setting up kubeconfig ENV for Github Chaos Action
run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
Expand Down
14 changes: 9 additions & 5 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -16,20 +16,24 @@ ARG HELM_VERSION=3.2.3
ARG RELEASE_ROOT="https://get.helm.sh"
ARG RELEASE_FILE="helm-v${HELM_VERSION}-linux-amd64.tar.gz"

ARG KUBECTL_VERSION=1.17.0
ARG KUBECTL_VERSION=1.22.0
ADD https://storage.googleapis.com/kubernetes-release/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl /usr/local/bin/kubectl
RUN chmod +x /usr/local/bin/kubectl

RUN apt-get update && apt-get install -y git && \
apt-get install -y ssh && \
apt-get install curl -y && \
apt install ssh rsync
RUN apt-get update && apt-get install -y git \
curl \
unzip \
&& apt-get clean

RUN apt-get update && \
curl -L ${RELEASE_ROOT}/${RELEASE_FILE} |tar xvz && \
mv linux-amd64/helm /usr/bin/helm && \
chmod +x /usr/bin/helm

RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip" && \
unzip awscliv2.zip && \
./aws/install

COPY README.md /
COPY entrypoint.sh /entrypoint.sh
COPY experiments ./experiments
Expand Down
156 changes: 123 additions & 33 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,15 @@ This action provides a way to perform different chaos experiments on the Kuberne

## Pre-requisites

Kubernetes 1.11 or later.
Kubernetes 1.16 or later.

## Overview.

There is a number of chaos experiments that can be performed using `github-chaos-actions`, you can select the one which you want to perform, and for more details about the experiment please visit the <a href="https://docs.litmuschaos.io/docs/getstarted"> experiment docs </a>section.

## Run a chaos experiment using this action

We just need to follow these simple steps to run a chaos experiment using this action:
We just need to follow these simple steps to run a chaos experiment using this action:

- **Deploy Application**: We need to have an application running on which the chaos will be performed. The user has to create an application and pass the application details through action's ENV. The details involved application kind (deployment,statefulset,daemonset), application label, and namespace.

Expand All @@ -22,13 +22,13 @@ We just need to follow these simple steps to run a chaos experiment using this

**The different experiments that can be performed using `github-chaos-actions` are:**

- **Pod Delete**: This chaos action causes random (forced/graceful) pod delete of application deployment replicas. It tests deployment sanity (high availability & uninterrupted service) and recovery workflows of the application pod. Check a sample usage of <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/pod-delete/README.md"> pod delete chaos action</a> and for more details about the experiment please visit <a href="https://docs.litmuschaos.io/docs/pod-delete"> pod delete docs</a>.
- **Pod Delete**: This chaos action causes random (forced/graceful) pod delete of application deployment replicas. It tests deployment sanity (high availability & uninterrupted service) and recovery workflows of the application pod. Check a sample usage of <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/pod-delete/README.md"> pod delete chaos action</a> and for more details about the experiment please visit <a href="https://docs.litmuschaos.io/docs/pod-delete"> pod delete docs</a>.

- **Container Kill**: This chaos action executes SIGKILL on the container of random replicas of application deployment. It tests the deployment sanity (high availability & uninterrupted service) and recovery workflows of an application. Check a sample usage of <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/container-kill/README.md"> container kill chaos action </a>and for more details about the experiment please visit <a href="https://docs.litmuschaos.io/docs/container-kill"> container kill docs</a>.

- **Node CPU Hog**: This chaos action causes CPU resource exhaustion on the Kubernetes node. The experiment aims to verify the resiliency of applications that operate under resource constraints wherein replicas may sometimes be evicted on account on nodes turning unschedulable (Not Ready) due to lack of CPU resources. Check a sample usage of <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/node-cpu-hog/README.md">node cpu hog chaos action </a>and for more details about the experiment please visit <a href="https://docs.litmuschaos.io/docs/node-cpu-hog"> node cpu hog docs</a> .

- **Node Memory Hog**: This chaos action causes Memory exhaustion on the Kubernetes node. The experiment aims to verify the resiliency of applications that operate under resource constraints wherein replicas may sometimes be evicted on account on nodes turning unschedulable due to lack of Memory resources. Check a sample usage of <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/node-memory-hog/README.md"> node memory hog chaos action </a>and for more details about the experiment please visit <a href="https://docs.litmuschaos.io/docs/node-memory-hog"> node memory hog docs</a>.
- **Node Memory Hog**: This chaos action causes Memory exhaustion on the Kubernetes node. The experiment aims to verify the resiliency of applications that operate under resource constraints wherein replicas may sometimes be evicted on account on nodes turning unschedulable due to lack of Memory resources. Check a sample usage of <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/node-memory-hog/README.md"> node memory hog chaos action </a>and for more details about the experiment please visit <a href="https://docs.litmuschaos.io/docs/node-memory-hog"> node memory hog docs</a>.

- **Pod CPU Hog**: This chaos action causes CPU resource consumption on specified application containers by starting one or more md5sum calculation process on the special file /dev/zero. It Can test the application's resilience to potential slowness/unavailability of some replicas due to high CPU load. Check a sample usage of <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/pod-cpu-hog/README.md"> pod cpu hog chaos action </a>and for more details about the experiment please visit <a href="https://docs.litmuschaos.io/docs/pod-cpu-hog"> pod cpu hog docs</a>.

Expand All @@ -42,7 +42,6 @@ We just need to follow these simple steps to run a chaos experiment using this

- **Pod Network Loss**: This chaos action injects chaos to disrupt network connectivity to Kubernetes pods. The application pod should be healthy once chaos is stopped. It causes loss of access to application replica by injecting packet loss. Check a sample usage of <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/pod-network-loss/README.md">pod network loss chaos action </a>and for more details about the experiment please visit <a href="https://docs.litmuschaos.io/docs/pod-network-loss"> pod network loss docs</a>


- **Pod Network Duplication**: This chaos action injects pod-network-duplication injects chaos to disrupt network connectivity to kubernetes podsThe application pod should be healthy once chaos is stopped. Service-requests should be served despite chaos. Check a sample usage of <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/pod-network-duplication/README.md">pod network duplication chaos action </a>and for more details about the experiment please visit <a href="https://docs.litmuschaos.io/docs/pod-network-duplication"> pod network duplication docs</a>

- **Pod Autoscaler**: This chaos action can be used for other scenarios as well, such as for checking the Node auto-scaling feature. For example, check if the pods are successfully rescheduled within a specified period in cases where the existing nodes are already running at the specified limits. Check a sample usage of <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/pod-autoscaler/README.md">pod autoscaler chaos action </a>and for more details about the experiment please visit <a href="https://docs.litmuschaos.io/docs/pod-autoscaler"> pod autoscaler docs</a>
Expand All @@ -56,39 +55,114 @@ A sample pod delete experiment workflow:
`.github/workflows/main.yml`

```yaml
name: CI
name: chaos-pipeline
#events can be modified as per requirements
on:
workflow_dispatch:

jobs:
chaos-action:
runs-on: ubuntu-latest
steps:
# KUBE_CONFIG_DATA is required env for litmuschaos/github-chaos-actions.
- name: Setting up kubeconfig ENV for Github Chaos Action
run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
env:
ACTIONS_ALLOW_UNSECURE_COMMANDS: true

- name: Setup Litmus
uses: litmuschaos/github-chaos-actions@master
env:
INSTALL_LITMUS: true

- name: Running Litmus pod delete chaos experiment
uses: litmuschaos/github-chaos-actions@master
env:
EXPERIMENT_NAME: pod-delete
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: latest
JOB_CLEANUP_POLICY: delete
APP_NS: default
APP_LABEL: run=nginx
APP_KIND: deployment
IMAGE_PULL_POLICY: Always
TOTAL_CHAOS_DURATION: 30
CHAOS_INTERVAL: 10
FORCE: false

- name: Uninstall Litmus
if: always()
uses: litmuschaos/github-chaos-actions@master
env:
LITMUS_CLEANUP: true
```
#### For EKS Clusters
A sample pod delete experiment workflow for EKS Clusters:
`.github/workflows/main.yml`

```yaml
name: chaos-pipeline
#events can be modified as per requirements
on:
push:
branches: [ master ]
workflow_dispatch:
jobs:
build:

chaos-action:
runs-on: ubuntu-latest

- name: Running Litmus pod delete chaos experiment
uses: litmuschaos/[email protected]
env:
##Pass kubeconfig data from secret in base 64 encoded form
KUBE_CONFIG_DATA: ${{ secrets.KUBE_CONFIG_DATA }}
##If litmus is not installed
INSTALL_LITMUS: true
##Give application info under chaos
APP_NS: default
APP_LABEL: run=nginx
APP_KIND: deployment
EXPERIMENT_NAME: pod-delete
##Custom image can also been used
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: latest
IMAGE_PULL_POLICY: Always
TOTAL_CHAOS_DURATION: 30
CHAOS_INTERVAL: 10
FORCE: false
##Select true if you want to uninstall litmus after chaos
LITMUS_CLEANUP: true
# AWS secrets are required to configure & run chaos
env:
AWS_SECRET_ACCESS_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
steps:
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: ${{ secrets.AWS_REGION }}
# Optionally kubeconfig can be passed from github secrets in base64 encoded form as mentioned above.
- name: Writing kubeconfig for eks cluster
run: |
aws eks --region ${{ secrets.AWS_REGION }} update-kubeconfig --name <eks_cluster_name>
- name: Setting up kubeconfig ENV for Github Chaos Action
run: echo ::set-env name=KUBE_CONFIG_DATA::$(base64 -w 0 ~/.kube/config)
env:
ACTIONS_ALLOW_UNSECURE_COMMANDS: true
- name: Setup Litmus
uses: litmuschaos/github-chaos-actions@master
env:
INSTALL_LITMUS: true
- name: Running Litmus pod delete chaos experiment
uses: litmuschaos/github-chaos-actions@master
env:
EXPERIMENT_NAME: pod-delete
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: latest
JOB_CLEANUP_POLICY: delete
APP_NS: default
APP_LABEL: run=nginx
APP_KIND: deployment
IMAGE_PULL_POLICY: Always
TOTAL_CHAOS_DURATION: 30
CHAOS_INTERVAL: 10
FORCE: false
- name: Uninstall Litmus
if: always()
uses: litmuschaos/github-chaos-actions@master
env:
LITMUS_CLEANUP: true
```

Get the details of the chaos action tunables for pod delete (above example) <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/experiments/pod-delete/README.md">here</a>

## Secrets
Expand All @@ -103,7 +177,6 @@ cat $HOME/.kube/config | base64

Some comman environment variables used for running the `github-chaos-actions` are:


<table>
<tr>
<th> Variables </th>
Expand Down Expand Up @@ -166,3 +239,20 @@ Some comman environment variables used for running the `github-chaos-actions` ar
<td> Default value is Always </td>
</tr>
</table>

#### For EKS Cluster

Setup AWS Credentials using [GitHub secrets](https://docs.github.com/en/actions/security-guides/encrypted-secrets). The secrets should now be populated to action using ENVs.

```yaml
jobs:
chaos-action:
runs-on: ubuntu-latest
# AWS secrets are required to configure & run chaos
env:
AWS_SECRET_ACCESS_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
AWS_REGION: ${{ secrets.AWS_REGION }}
```

> Note: Either these secrets can be setup at Job level or have to be provided in all chaos-action steps.
16 changes: 13 additions & 3 deletions entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,25 @@ TEST_TIMEOUT=$((600 + $TOTAL_CHAOS_DURATION))
PARALLEL_EXECUTION=${PARALLEL_EXECUTION:=1}

##Extract the base64 encoded config data and write this to the KUBECONFIG
mkdir -p ${HOME}/.kube
echo "$KUBE_CONFIG_DATA" | base64 --decode > ${HOME}/.kube/config
export KUBECONFIG=${HOME}/.kube/config
if [ ! -z "$KUBE_CONFIG_DATA" ]
then
mkdir -p ${HOME}/.kube
echo "$KUBE_CONFIG_DATA" | base64 --decode > ${HOME}/.kube/config
export KUBECONFIG=${HOME}/.kube/config
fi

##Setup
mkdir -p $HOME/go/src/github.com/litmuschaos
cd ${GOPATH}/src/github.com/litmuschaos/
dir=${GOPATH}/src/github.com/litmuschaos/chaos-ci-lib

if [[ ! -z $AWS_ACCESS_KEY_ID ]] && [[ ! -z $AWS_SECRET_ACCESS_KEY ]] && [[ ! -z $AWS_REGION ]]
then
aws configure set default.region ${AWS_REGION}
aws configure set aws_access_key_id ${AWS_ACCESS_KEY_ID}
aws configure set aws_secret_access_key ${AWS_SECRET_ACCESS_KEY}
fi

if [ ! -d $dir ]
then
git clone https://github.com/litmuschaos/chaos-ci-lib.git
Expand Down
55 changes: 27 additions & 28 deletions experiments/container-kill/README.md
Original file line number Diff line number Diff line change
@@ -1,51 +1,50 @@
# Container Kill Experiment

This experiment executes SIGKILL on container of random replicas of an application deployment. It tests the deployment sanity (replica availability & uninterrupted service) and recovery workflows of an application. Check <a href="https://docs.litmuschaos.io/docs/container-kill/">container kill docs</a> for more info. To know more and get started with chaos-actions visit <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/README.md">github-chaos-actions</a>.
This experiment executes SIGKILL on container of random replicas of an application deployment. It tests the deployment sanity (replica availability & uninterrupted service) and recovery workflows of an application. Check <a href="https://docs.litmuschaos.io/docs/container-kill/">container kill docs</a> for more info. To know more and get started with chaos-actions visit <a href="https://github.com/litmuschaos/github-chaos-actions/blob/master/README.md">github-chaos-actions</a>.

#### Sample workflow
#### Sample workflow

A Sample workflow to run the container-kill experiment:


`.github/workflows/main.yml`

```yaml
name: CI

on:
push:
branches: [ master ]
branches: [master]

jobs:
build:

runs-on: ubuntu-latest
- name: Running container kill chaos experiment
uses: litmuschaos/github-chaos-actions@v0.3.1
env:
KUBE_CONFIG_DATA: ${{ secrets.KUBE_CONFIG_DATA }}
##If litmus is not installed
INSTALL_LITMUS: true
##Give application info under chaos
APP_NS: default
APP_LABEL: run=nginx
APP_KIND: deployment
EXPERIMENT_NAME: container-kill
##Custom images can also be used
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: latest
IMAGE_PULL_POLICY: Always
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 20
CHAOS_INTERVAL: 10
CONTAINER_RUNTIME: docker
##Select true if you want to uninstall litmus after chaos
LITMUS_CLEANUP: true
steps:
- name: Running container kill chaos experiment
uses: litmuschaos/github-chaos-actions@v0.4.0
env:
KUBE_CONFIG_DATA: ${{ secrets.KUBE_CONFIG_DATA }}
##If litmus is not installed
INSTALL_LITMUS: true
##Give application info under chaos
APP_NS: default
APP_LABEL: run=nginx
APP_KIND: deployment
EXPERIMENT_NAME: container-kill
##Custom images can also be used
EXPERIMENT_IMAGE: litmuschaos/go-runner
EXPERIMENT_IMAGE_TAG: latest
IMAGE_PULL_POLICY: Always
TARGET_CONTAINER: nginx
TOTAL_CHAOS_DURATION: 20
CHAOS_INTERVAL: 10
CONTAINER_RUNTIME: docker
##Select true if you want to uninstall litmus after chaos
LITMUS_CLEANUP: true
```
## Environment Variabels
The application pod for container-kill will be identified with the app info variables.
The application pod for container-kill will be identified with the app info variables.
**Supported Chaos Action Tunables**
Expand Down
Loading

0 comments on commit 22461ed

Please sign in to comment.