Testing

The decision to discuss documentation right before testing is intentional. Just like documentation, testing is a crucial part of making your code more understandable to others. Having tests for your code is non negotiable. But what exactly are tests? What types of tests are there? And why is it non negotiable?

General types

Before I wrote my first code test the only other place where I had encountered anything similar was statistics and econometrics. There one would typically use tests to ensure that models are not junk, that coefficients can be trusted and so on. Code tests are slightly different though: they are more deterministic. In general a test is a way to check that a part of your code, or the code in its entirety functions as you expect it to. That means that to write a test you have to specify how code should function.

One can do this specification at different levels. If you are testing the expected behaviour of a function then the tests you write are called unit tests. If you are testing the functioning of an application as a whole, or of your application with external components, it is typically called integration testing. Good maintainable code should have solid unit and integration test coverage.

Benefits of testing

Testing is a huge subject with many different aspects. So huge that some even claim that one should write tests before writing the code that it tests. Others try to organise the entire development process around testing - often called test driven development or TDD. In any case, here is a short list of some of the main benefits of writing tests (I'm sure one can add more):

builds confidence in code base
helps you and others understand what the code does
makes the code easier to maintain over time
clarifies the purpose of units of code
encourage writing testable code, which is often easier to understand
gets you thinking about what exactly you are trying to do

When I see new code I often read the tests first, just to see how it is supposed to function before I try and understand how it actually works. Any sane software company will also have high standards that require code to have solid test coverage before accepting new code. So learning to write tests as a part of what you're doing, rather than as a nice-to-have after though is essential.

Unit testing in Python

The unittest framework is included in the standard library and for many purposes will be good enough. Other like to use nose which is a wrapper and extension of unittest - some may say it is simpler to write tests with nose. A good introduction to nose is worth a read. Let's look at an example.

Suppose we have function that adds a binary variable to a pandas DataFrame, based on some criteria in a file called cut_off.py:

import pandas as pd

def add_binary_var(df, cutoff_val):
    """Adds a binary variable to the DataFrame based on a cutoff value.
    :param df: a pandas DataFrame
    :param cutoff_val: an integer or float
    """
    df['binary_var'] = [ 1 if df['x'][i] <= cutoff_val else 0
                        for i in range(len(df)) ]
    return df

We could then test that this function works as expected by writing a very simple test and keeping that in a file called test_cut_off.py:

from cut_off import *

def setup_func():
    """Gives the full DataFrame with correct results"""
    return pd.DataFrame({
                'x': [10, 2, 20, 30, 11, -10, 90]
                'binary_var':[1, 1, 0, 0, 0, 1, 0]})

def test_add_binary_var():
    """Test creation of a binary variable with given cutoff value."""
    correct_df = setup_func()
    test_df = pd.DataFrame({'x': [10, 2, 20, 30, 11, -10, 90]})
    test_df_with_binary = add_binary_var(test_df, 10)
    assert (False not in set(correct_df == test_df_with_binary))

The test says that we expect all comparisons for each cell of the DataFrame to be equal. We can run then run the test with nose: nosetests test_cut_off.py. This will display information on whether the test passed and if not why not. Needless to say, test cases can become much more complex. The official documentation for the testing libraries will give detailed examples and explanations and are well worth exploring.

Unit testing in R

Writing unit tests with the testthat package is wonderful. Let's look at an example. Suppose we had a function that checked whether a name is present in a list - allowing for partial matching, like eon with leon. Suppose that function was in a file called checker.r:

#' Find a name in a list with fuzzy matching
#'
#' \code{name_in_list} returns a boolean value TRUE/FALSE
#'
#' @param name a string to search for
#' @param list a list of strings to search through
#'
#' @example
#' name_list <- c('eee', 'moar', 'blaurgh', 'leon')
#' name_in_list('eon', name_list)
#'

name_in_list <- function(name, list) {
  comparisons <- sapply(list, function(x) {
    ifelse(grep(name, x) == 1, TRUE, FALSE)
  })
  answer <- !(is.na(match(TRUE, comparisons)))
  answer
}

We can write a test and place it in a file called test_checker.r:

library(testthat)
source('checker.r')
context('testing that we find a name in a list')

test_that('name is detected when there', {
  list <- c('whammy', 'phteven', 'markus', 'mucus')
  expect_that(name_in_list('mark', list), is_true())
})

test_that('name is not detected when not there', {
  list <- c('whammy', 'phteven', 'markus', 'mucus')
  expect_that(name_in_list('mmmmmmmmmanny', list), is_false())
})

The test can be run from within the R console as such: test_file('test_checker.r'). This will provide informative output on the test if it fails. A great idea to further fix these ideas and practices is to read about the chapter about testing in Hadley Wickham's Advanced R book.

Want to add to the wiki?

Please drop me a mail

Testing

General types

Benefits of testing

Unit testing in Python

Unit testing in R

Want to add to the wiki?

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Contents

Getting started

Software fundamentals

Working with data

Visualisation

Tools

Programming resources

Clone this wiki locally