Skip to content

Latest commit

 

History

History
465 lines (353 loc) · 16.2 KB

README.adoc

File metadata and controls

465 lines (353 loc) · 16.2 KB

Introduction

Build Status

Usually when we talk about writing tests, the first thing that comes to your mind is some kind of static test where you send an input and you expect an output. For example you send a wrong parameter to a REST service, and you expect that it returns an error message/status code.

But usually applications runs more time than the amount of time it takes to execute all tests. Probably days or months until you update it. And during this time, things happen, for example network starts to go slow, a unknown process eats all CPU or if you are using Docker, a container dies. So you can see that when you run your application for a long time some kind of chaos might appear.

The question is, are you sure your application deals correctly with these situations? You can test it manually, but if you want to apply CI/CD approach then you need some automatic way to execute them.

And this is where Arquillian Cube Q helps you. Arquillian Cube Q is an extension of Arquillian Cube (https://github.com/arquillian/arquillian-cube) that allows you to write chaos tests. Since Arquillian Cube Q is an extension of Cube, it relies on Docker to execute them.

1. Chaos

De Lancie crop

There are several level of chaos that you might test, from network chaos (latency, bandwidth limitation, …​) to operative system chaos (cpu burn, io burn, dill disk, …​).

Arquillian Q as all the Arquillian project, it reuses existing chaos frameworks by integrating them into Arquillian philosophy. Let’s see what is supported:

1.1. Network Chaos

To do network chaos Arquillian Q integrates with Toxiproxy project (https://github.com/Shopify/toxiproxy). Toxiproxy is a framework for simulating network conditions. It is a TCP proxy that intercepts communication between two endpoints and adds some chaos before reaching the real endpoint.

Toxiproxy supports next toxics:

latency

Add a delay to all data going through the proxy. The delay is equal to latency +/- jitter.

down

Bringing a service down

bandwidth

Limit a connection to a maximum number of kilobytes per second.

slow close

Delay the TCP socket from closing until delay has elapsed.

timeout

Stops all data from getting through, and closes the connection after timeout. If timeout is 0, the connection won’t close, and data will be delayed until the toxic is removed.

slicer

Slices TCP data up into small bits, optionally adding a delay between each sliced "packet".

Arquillian Q supports Toxiproxy by registering as docker container toxiproxy and then inspecting the cube definitions (in cube format or docker-compose format) and automatically redirect links to toxiproxy.

As an example:

ContainerA -link→ ContainerB

is converted to:

ContainerA -link→ ToxiproxyContainer -link→ ContainerB

After this you are able to program some toxics to Toxiproxy and execute the test.

1.1.1. Adding Dependency

Arquillian Q Toxiproxy is only a jar deployed to Maven central:

pom.xml
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.jboss.arquillian</groupId>
      <artifactId>arquillian-bom</artifactId>
      <version>${version.arquillian_core}</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>org.arquillian.cube.q</groupId>
    <artifactId>arquillian-cube-q-toxic</artifactId>
    <scope>test</scope>
    <version>${version.arquillian_q}</version>
  </dependency>
  <dependency>
    <groupId>org.jboss.arquillian.junit</groupId>
    <artifactId>arquillian-junit-standalone</artifactId>
    <scope>test</scope>
  </dependency>
  <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
  </dependency>
</dependencies>
Important
Notice that instead of registering arquillian-junit-container as you usually do in Arquillian test, you are using arquillian-junit-standalone. This is because it has no sense to use in these kind of tests microdeployments feature (method annotated with @Deployment).

1.1.2. Configuration

You don’t need to configure anything else from the point of view of Q apart from Cube configuration file.

arquillian.xml
<?xml version="1.0" encoding="UTF-8"?>
<arquillian xmlns="http://jboss.org/schema/arquillian"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="
        http://jboss.org/schema/arquillian
        http://jboss.org/schema/arquillian/arquillian_1_0.xsd">

  <extension qualifier="docker">
    <property name="machineName">dev</property>
    <property name="dockerContainers">
        hw:
          image: lordofthejars/helloworld
          env: ["CATALINA_OPTS=-Djava.security.egd=file:/dev/./urandom"]
          portBindings: [8081->8080/tcp]
          links:
            - pingpong:pingpong

        pingpong:
          image: jonmorehouse/ping-pong
          exposedPorts: [8080/tcp]
    </property>
  </extension>

</arquillian>

In this case container helloworld is connecting to pingpong container.

1.1.3. Test

Then the test looks like:

@RunWith(Arquillian.class)
public class ToxicFuntionalTestCase {

  @ArquillianResource
  private NetworkChaos networkChaos; // (1)

  @HostIp
  private String ip;

  @Test
  public void shouldAddLatency() throws Exception {
    networkChaos.on("pingpong", 8080).latency(latencyInMillis(4000)) // (2)
      .exec(() -> { // (3)

        URL url = new URL("http://" + ip + ":" + 8081 + "/hw/HelloWorld");
        final long l = System.currentTimeMillis();
        String response = IOUtil.asString(url.openStream());
        System.out.println(response);
        System.out.println("Time:" + (System.currentTimeMillis() - l));
        // assertions

    }); // (4)
  }
}
  1. Enrich the test with NetworkChaos instance to communicate with Toxiproxy.

  2. Adds a latency of 4 seconds when communication is done to pingpong container through port 8080.

  3. Executes test logic. Notice that the execution time will be greater than 4 seconds.

  4. After callback executions, toxics are reseted.

Tip
exec method also supports you pass how many times do you want to execute the test: networkChaos.on("pingpong", 8080).latency(latencyInMillis(4000)).exec(times(2), () → {} or for example the amount of time you want to keep executing the test Q.on("pingpong", 8080).exec(during(15, TimeUnit.SECONDS), () → {}.

1.1.4. Adding some randomness

Some of the discrete values set in toxics such as slowClose, bandwidth, timeout or slice can be randomized using mathematical distributions. At this time two distributions are supported:

For example, this is how you can randomize the latency:

networkChaos.on("pingpong", 8080)
            .latency(logNormalLatencyInMillis(2000, 0.3))
            .exec(times(2), () -> {

     URL url = new URL("http://" + ip + ":" + 8081 + "/hw/HelloWorld");
     final long l = System.currentTimeMillis();
     String response = IOUtil.asString(url.openStream());
     System.out.println(response);
     System.out.println("Time:" + (System.currentTimeMillis() - l));

});

In the configuration above, latency times are distributed in using a log normal distribution with median of 2 seconds and 0.3 as sigma value. Then for each iteration of the test, a new value is calculated and send to toxiproxy.

1.1.5. Binding Ports Chaos

Sometimes you don’t want to add chaos between containers but in binding ports. That is adding chaos to the communication between host and containers. This is really useful in cases when you want to test what’s happening to your frontend application (javascript) when there is some chaos.

Assuming that A has a port binding, something like:

A → B

is converted to:

Proxy → A → B

Where A has no port binding anymore but only exposed ports and it is the Proxy who has the port binding.

To use this just configure next parameter in arquillian.xml file:

arquillian.xml
<extension qualifier="networkChaos">
    <property name="toxifyPortBinding">true</property>
</extension>
Important
By defult this flag is false, if you set to true then no chaos can be done between containers, only between host and containers.

1.2. Container Chaos

To do container chaos Arquillian Q integrates with Pumba project (https://github.com/Shopify/toxiproxy). Pumba is an application that you run it on every Docker host, in your cluster and it, once in a while, will "randomly" stop running containers, matching specified name/s or name patterns. You can even specify the signal, that will be sent to “kill” the container.

It supports:

  • Stop a container.

  • Remove a container.

  • Kill a container process with signal.

Arquillian Q will register a Pumba container inside the configured docker host you set in Arquillian Q.

1.2.1. Adding Dependency

Arquillian Q Pumba is only a jar file deployed in Maven central.

pom.xml
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.jboss.arquillian</groupId>
      <artifactId>arquillian-bom</artifactId>
      <version>${version.arquillian_core}</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>org.arquillian.cube.q</groupId>
    <artifactId>arquillian-cube-q-pumba</artifactId>
    <scope>test</scope>
    <version>${version.arquillian_q}</version>
  </dependency>
  <dependency>
    <groupId>org.jboss.arquillian.junit</groupId>
    <artifactId>arquillian-junit-standalone</artifactId>
    <scope>test</scope>
  </dependency>
  <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
  </dependency>
</dependencies>
Important
Notice that instead of registering arquillian-junit-container as you usually do in Arquillian test, you are using arquillian-junit-standalone. This is because it has no sense to use in these kind of tests microdeployments feature (method annotated with @Deployment).

1.2.2. Configuration

You don’t need to configure anything else from the point of view of Q apart from Cube configuration file.

arquillian.xml
<?xml version="1.0" encoding="UTF-8"?>
<arquillian xmlns="http://jboss.org/schema/arquillian"
            xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
            xsi:schemaLocation="
        http://jboss.org/schema/arquillian
        http://jboss.org/schema/arquillian/arquillian_1_0.xsd">

  <extension qualifier="docker">
    <property name="machineName">dev</property>
    <property name="dockerContainers">
      pingpong:
        image: jonmorehouse/ping-pong
        exposedPorts: [8080/tcp]

      pingpong2:
        image: jonmorehouse/ping-pong
        exposedPorts: [8080/tcp]
    </property>
  </extension>

</arquillian>

In this case we are defining two instances of same image.

1.2.3. Test

Then the test looks like:

@RunWith(Arquillian.class)
public class PumbaFunctionalTestCase {

  @ArquillianResource // (1)
  ContainerChaos containerChaos;

  @ArquillianResource
  DockerClient dockerClient; // (2)

  @Test
  public void shouldKillContainers() throws Exception {
    containerChaos
            .onCubeDockerHost()
                .killRandomly( // (3)
                        ContainerChaos.ContainersType.regularExpression("^pingpong"), // (4)
                        ContainerChaos.IntervalType.intervalInSeconds(4), // (5)
                        ContainerChaos.KillSignal.SIGTERM
                )
            .exec(); // (6)

        final List<Container> containers = dockerClient.listContainersCmd().exec();
        //Pumba container is not killed by itself
        assertThat(containers).hasSize(1);

    }

}
  1. Enrich test with container chaos

  2. Enrich test with DockerClient class to communicate with DockerHost in test

  3. Kills randomly one by one containers

  4. Kills only containers with name starting with pingpong

  5. Time to wait between kill another container

  6. Starts Pumba. In this case no callback used.

As happens in Network Chaos you can also specify test as callback and specify times to execute the test or the duration.

1.3. Operative System Chaos

To do operative system chaos Arquillian Q uses some modified version scripts of Netflix Simian Army project (). Some scripts have been modified to have sense into Docker world instead of AWS world.

It supports:

  • Block a port using iptables command.

  • Burn CPU using dd command. That is putting CPU to 100%.

  • Burn IO using dd comomand.

  • Fill disk with dd command.

  • Kill process using pkill command.

  • Null Route using ip command.

Important
Scripts are executed inside the container. This means that the command used in the script must be installed inside the container. Some images might contain them, others not.
Tip
Making chaos with scripts means a whole new kind of possibilities since the only barrier is the commands you need to execute them. Please feel free to contribute with your own scripts.

1.3.1. Adding Dependency

Arquillian Simian Army is only a jar file deployed in Maven central.

pom.xml
<dependencyManagement>
  <dependencies>
    <dependency>
      <groupId>org.jboss.arquillian</groupId>
      <artifactId>arquillian-bom</artifactId>
      <version>${version.arquillian_core}</version>
      <type>pom</type>
      <scope>import</scope>
    </dependency>
  </dependencies>
</dependencyManagement>

<dependencies>
  <dependency>
    <groupId>org.arquillian.cube.q</groupId>
    <artifactId>arquillian-cube-q-simianarmy</artifactId>
    <scope>test</scope>
    <version>${version.arquillian_q}</version>
  </dependency>
  <dependency>
    <groupId>org.jboss.arquillian.junit</groupId>
    <artifactId>arquillian-junit-standalone</artifactId>
    <scope>test</scope>
  </dependency>
  <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
  </dependency>
</dependencies>
Important
Notice that instead of registering arquillian-junit-container as you usually do in Arquillian test, you are using arquillian-junit-standalone. This is because it has no sense to use in these kind of tests microdeployments feature (method annotated with @Deployment).

1.3.2. Test

Then the test looks like:

@RunWith(Arquillian.class)
public class SimianArmyFunctionalTestCase {

    @ArquillianResource // (1)
    OperativeSystemChaos operativeSystemChaos;

    @HostIp
    String dockerHost;

    @HostPort(containerName = "pingpong ", value = 8080)
    int port;

    @Test(expected = Exception.class) @Ignore //Running this test in same machine makes everything screwed
    public void shouldExecuteBurnCpuChaos() throws Exception {
        operativeSystemChaos.on("pingpong") // (2)
            .burnCpu(singleCpu()) // (3)
            .exec(); // (4)

        //.....

    }
  1. Enrich test with operative system chaos

  2. Sets the container to set the chaos

  3. Sets burn cpu chaos as if the system had only one cpu

  4. Starts the burn cpu script. In this case no callback used

As happens in Network Chaos you can also specify test as callback and specify times to execute the test or the duration.