P.Mean: Testing an R function (created 2013-11-02).

News: Sign up for "The Monthly Mean," the newsletter that dares to call itself average, www.pmean.com/news.

I am working on a grant resbumission and one of the things I need to do is write up is more details about the process we will use to develop the programs that we need to monitor patient accrual. I write a lot of programs, but almost all of them are programs that are run once, in a very specialized and tightly controlled setting. If you develop programs that other people will use, you need to test them against a range of inputs to make sure that they do what you want them to do. This is one of the topics covered in a short course I took at the Joint Statistical Meetings, Practical Software Engineering for Statisticians taught by Murray Stokely of Google.

I have done something like this informally for a few of the programs that I use. For example, I have a function, zpad, which takes a number and pads it to the left with a "0" if the number is smaller than 10.

zpad <- function(x) {
  ifelse(x<10,paste("0",as.character(x),sep=""),as.character(x))
}

and I checked that it worked with the following code:

zpad(8:11)

which produces the output

[1] "08" "09" "10" "11"

Now, if I wanted to improve this function to anticipate a wider range of inputs, I could use several functions in the R package, stop, warning, message, and tryCatch.

If you place a stop function in the middle of a function, it will print a custom error message and quit further computations within that function.

zpad <- function(x) {
  if(is.numeric(x)==FALSE) stop("Non-numeric input")
  ifelse(x<10,paste("0",as.character(x),sep=""),as.character(x))
}

Now, if you test the function with letters, the function will fail.

> zpad(c("H","I","J","K"))
Error in zpad(c("H", "I", "J", "K")) : Non-numeric input

You might also want to check for non-integer values. These, you might want to let through, but with a warning message.

zpad <- function(x) {
  if(is.numeric(x)==FALSE) stop("Non-numeric input")
  if(any(x!=floor(x))) warning("Non-numeric input")
  ifelse(x<10,paste("0",as.character(x),sep=""),as.character(x))
}

Now your function will pad the non-integer values, but will send you a warning message as well.

> zpad(c(8.5,9.5,10.5,11.5))
[1] "08.5" "09.5" "10.5" "11.5"
Warning message:
In zpad(c(8.5, 9.5, 10.5, 11.5)) : Non-integer input

You can also put in the option to suppress the warning message.

zpad <- function(x,warn=TRUE) {
  if(is.numeric(x)==FALSE) stop("Non-numeric input")
  if(any(x!=floor(x))&warn==TRUE) warning("Non-numeric input")
  ifelse(x<10,paste("0",as.character(x),sep=""),as.character(x))
}

Another thing you might want to change is how the function handles numbers that appear as strings. You could coerce strings to a numeric format, but you need to do it carefully. This is where the try function helps. You can try a computation and if it generates an error that would otherwise stop further calculations, it instead returns that error message as an object of class try-error. You normally do this with the SILENT=TRUE option so that you can write the code you need to handle the exception. Here's a simple example:

> x <- try(sqrt(2),silent=TRUE)
> print(x)
[1] 1.414214
> x <- try(sqrt("A"),silent=TRUE)
> print(x)
[1] "Error in sqrt(letters) : non-numeric argument to mathematical function\n"
attr(,"class")
[1] "try-error"
attr(,"condition")
<simpleError in sqrt(letters): non-numeric argument to mathematical function>

Notice the timing. The error does not print when it occurs, and only appears when we intentionally decide to print it. This allows us to customize the error message and/or to run some alternate commands.

Unfortunately, the try function does not trap warnings. So if you tried to convert strings to numeric, the warning message would slip right by.

> y <- try(as.numeric(c("h","i","j")))
Warning message:
In doTryCatch(return(expr), name, parentenv, handler) :
NAs introduced by coercion

You need a more complicated function, tryCatch. It allows you to have more complex functions when an error message appears, but also allows you to call a special function when a warning appears as well. For this simple example, you should just create the simplest of functions, one that returns the warning message itself.

> y <- tryCatch(as.numeric(c("h","i","j")),warning=function(w) {w})
> print(y)
<simpleWarning in doTryCatch(return(expr), name, parentenv, handler): NAs introduced by coercion>

If you want to catch both warnings and errors, just add warning= and error= to this function.

So here's a function that will take string or numeric vectors. It quietly tries to convert the string to numeric, and if there are problems, it stops with a custom error message.

zpad <- function(x,warn=TRUE) {
  y <- tryCatch(as.numeric(x),warning=function(w) {w},silent=TRUE)
  if(is.numeric(y)==FALSE) stop("Non-numeric input")
  if(any(y!=floor(y))&warn==TRUE) warning("Non-integer input")
  ifelse(y<10,paste("0",as.character(y),sep=""),as.character(y))
}

Here is what this function will produce for a variety of test cases.

> zpad(8:11)
[1] "08" "09" "10" "11"
> zpad(c("H","I","J","K"))
Error in zpad(c("H", "I", "J", "K")) : Non-numeric input
> zpad(c(8.5,9.5,10.5,11.5))
[1] "08.5" "09.5" "10.5" "11.5"
Warning message:
In zpad(c(8.5, 9.5, 10.5, 11.5)) : Non-integer input
> zpad(c("8","9","10","11"))
[1] "08" "09" "10" "11"

It's interesting to compare this function with and without the tryCatch function. In the latter case, it will produce a warning message when we prefer that the program stop with an error message instead.

This sort of thing can turn into a never-ending task. What should your zpad function do when it is fed a date as input? Do I really have to test this case, or can I assume that the user is reasonably competent?

You have to decide how much and what types of things to check for. For something that you are only using yourself, a minimal amount of testing should be needed. A function that you intend to share with others, however, should try to antipate any reasonable or and even some unreasonable input values. Try the weird values first, and you may be okay with how R handles them. If you don't like the errors and warnings that R produces, you should handle these yourself using the stop, warning, try, and tryCatch functions.

For this to make sense, you need to write simple functions with simple outputs. This means lots of functions, but the alternative, writing only a few functions with lots of complex outputs, gives you something that is difficult to test well.

Now that you have a bunch of test cases, you want to package them up and run them as a group. This will allow you to run your test suite on a variety of different platforms (e.g., Windows, Linux). It also allows you to test whether your function works with the latest version of R, when it comes out.

To do this, you need a special library, testthat. This library has various functions, such as expect_error. When you expect an error, and you get it, nothing happens. But when you expect an error, and you don't get it, you are told so.

> expect_error(zpad(c("H","I","J","K")))
> expect_error(zpad(c("8","9","10","11")))
Error: zpad(c("8", "9", "10", "11")) code did not generate an error

It sounds a bit circular, doesn't it? It's an error that your code did not generate an error. That's life in the world of software testing.

So here's our series of tests. If they run correctly, then you see nothing. But if one or more of them run incorrectly, you are told which tests run incorrectly

expect_equal(zpad(8:11),c("08","09","10","11"))
expect_error(zpad(c("H","I","J","K")))
expect_warning(zpad(c(8.5,9.5,10.5,11.5)))
expect_equal(zpad(c(8.5,9.5,10.5,11.5),warn=FALSE),c("08.5","09.5","10.5","11.5"))

expect_equal(zpad(c("8","9","10","11")),c("08","09","10","11"))

Notice that you should test the case with non-integers twice, the second time with the warn flag set to FALSE. When you run this set of commands, you see nothing. Here no news is good news. The testthat package has extra features, such as the ability to run your suite of tests whenever your code changes.

Creative Commons License This page was written by Steve Simon and is licensed under the Creative Commons Attribution 3.0 United States License. Need more information? I have a page with general help resources. You can also browse for pages similar to this one at R Software.