R Tips

Home

Contact

Writing your own functions

12 October 2009, latest revision: 12 September 2023

Writing functions in R is an important time-saver, and it is a skill you should learn early. As an example of how functions work, we can write a function to calculate the coefficient of variation, which is the mean of a data set divided by its standard deviation. Such a function could be defined like this:

coeffVar <- function(x) { CV <- mean(x)/sd(x) CV }

Functions are objects, so in the first line, the function is given a name, coeffVar. Names should be descriptive and intuitive, so coeffVar would be a good name, but cv147smh would not.

The word function is an R function that creates our function object. The result of the function() command is assigned to the name of the function object, here coeffVar().

Functions are often defined with parameters, objects that are used in the function’s code. When the function is called, arguments to the function are supplied, giving the actual values used when the function executes. These parameters are listed in parentheses after the word function. In this example, all the coeffVar() function needs to work is one parameter, a vector of data called x inside the function. Any name for this parameter could have been chosen, but x is used here because that matches R’s convention.

Following the list of parameters is a pair of curly braces, {}, which enclose all the statements that the function will execute. Each statement goes on its own line, typically starting on the line following the opening curly brace. For clarity, statements within the function are indented, which helps a reader to identify the contents of a function.

If a function returns a value, the last statement in that function will usually be the name of the object that is returned. In some cases, a function may return a value from within the body of the function, and in that case, the object is wrapped in the return() function. You could use return() at the end of function, but it is redundant and serves no special purpose, so return() there is usually omitted. In this example, we calculate the coefficient of variation and assign it to an object called CV. The final line of the function is simply CV, because we want the function to return that value.

In short, we supply one argument (x) to this particular function, and it returns the coefficient of variation.

A function is a black box; it is isolated from the rest of the world. A function has no ability to create objects other than what it returns, and it has no ability to modify objects outside of itself. A good function uses only those objects supplied as arguments or created inside the function. The parameters defined for a function exist only inside that function, so the names do not need to match any objects outside the function.

Converting existing code to a function

We often write code and later realize that we will have to use that same block of code repeatedly. These situations are ripe for creating functions because they eliminate redundant code.

For example, consider this block of code, which makes a particular type of plot:

plot(Ca, Mg, pch=16, las=1, xlab="Ca (ppt)", ylab="Mg (ppt)") abline(h=2.5, col="gray", lty="dotted") abline(v=3.1, col="gray", lty="dotted")

To convert this into a function, we need to identify what this code needs. It requires objects for the Ca and Mg values and a symbol for the printing character. Also, let’s assume that we know that the data will always be calcium and magnesium and that the positions, colors, and types of the lines will never need to be changed.

With that we can define a function with clear, explanatory parameter names.

CaMgPlot <- function(calcium, magnesium, pch) {}

Next, we copy our existing code as is into the body of the function:

CaMgPlot <- function(calcium, magnesium, pch) { plot(Ca, Mg, pch=16, las=1, xlab="Ca (ppt)", ylab="Mg (ppt)") abline(h=2.5, col="gray", lty="dotted") abline(v=3.1, col="gray", lty="dotted") }

Last, we change any of the objects in the body of our function to match the names of the parameters. For example, Ca will become calcium, Mg will become magnesium, and the value of pch will be the pch parameter. By doing this, everything the function needs will be passed to it as an argument; it will not need anything outside of the function for it to work. The function now becomes:

CaMgPlot <- function(calcium, magnesium, pch) { plot(calcium, magnesium, pch=pch, las=1, xlab="Ca (ppt)", ylab="Mg (ppt)") abline(h=2.5, col="gray", lty="dotted") abline(v=3.1, col="gray", lty="dotted") }

At this point, all objects inside the function are either defined parameters to the function (calcium, magnesium, pch) or are defined inside the function (such as the axis labels and the styling of the horizontal and vertical lines).

We can now use our new function to create our original plot:

CaMgPlot(calcium=Ca, magnesium=Mg, pch=16)

Following convention, we could also call the calcium and magnesium arguments by position:

CaMgPlot(Ca, Mg, pch=16)

This is much more compact. It also hides the details of how the plot is made, making the main body of the code simpler. If we called this function in multiple places, we could change our function and it would propagate to all those places. If we didn’t use a function, we would have to make the same changes everywhere in our code, an error-prone venture because we are likely to miss some of the changes.

We can make one final change to our function to allow for default values. For example, perhaps we know that we usually want to make the plotting symbols be small filled circles. We can set the default value of pch to make this so. The first line of our function is all that needs to change, and it becomes:

CaMgPlot <- function(calcium, magnesium, pch=16) {

(Note that you would have to repeat the lines of code for the remainder of the function; you cannot type only the first line to update the function.)

Now, when we call our function, the points will be small filled circles by default. Our function call becomes very simple:

CaMgPlot(calcium=Ca, magnesium=Mg)

which is even simpler if we call our arguments by position (just remember to put these in the correct order):

CaMgPlot(Ca, Mg)

Advantages of using functions

Functions have many advantages:

Functions reduce the amount of duplicated code, lessening the chances of errors.
Functions shorten the length of code, making it easier to read, especially if functions are given well-chosen names.
Blocks of related code are easier to isolate and test, making the overall code more robust.
Functions make code more portable, as a useful function can easily be used for other projects, often with little or no modifications.

Good coders use functions extensively, and you should start to look for opportunities to create a library of your own.

Data Analysis in the Geosciences

GEOL 8370