R Tips

Home

Contact

Steven Holland

Magic numbers and strings

20 September 2019

Suppose you wanted to make a plot of magnesium vesus calcium, and that you wanted to add boxes around particular parts of it. It’s common to see something like this:

plot(Ca, Mg, pch=16) rect(0, 0, 2.3, 5.7, border="blue") rect(5, 7, 7.8, 11.2, border="red")

Experienced R users will recognize that each call to rect() specifies the position of the left, bottom, right, and top of the box, and the color of the border. What is not clear is what those numbers represent, specifically in terms of the underlying variables, calcium and magnesium.

Such numerical constants embedded in code without explanation are sometimes called magic numbers, because they allow the code to work. Why these values are used is unclear; they just “make it work”.

One approach would be to explain the values with comments. Suppose they reflect the ranges for clean and contaminated samples:

plot(Ca, Mg, pch=16) # The cutoff for a clean level of calcium is 2.3; 5.7 for magnesium rect(0, 0, 2.3, 5.7, border="blue") # The values defining the zone of contaminated calcium are 5–7.8; 7–11.2 for magnesium rect(5, 7, 7.8, 11.2, border="red")

Although this makes your intent clear, it raises a new problem: if these cutoffs change, you have to change them in the code and in the comments. It is common to forget to change the value in the comments, causing the comment to no longer be helpful. It would better to have the code be self-commenting.

Embedded constants can be avoided by defining the values using objects, with names that reflect their meaning. For example, we could change the code as follows:

plot(Ca, Mg, pch=16) CaClean <- 2.3 MgClean <- 5.7 CaContaminatedLower <- 5 CaContaminatedUpper <- 7.8 MgContaminatedLower <- 7 MgContaminatedUpper <- 11.2 rect(0, 0, CaClean, MgClean, border="blue") rect(CaContaminatedLower, MgContaminatedLower, CaContaminatedUpper, MgContaminatedUpper, border="red")

Although the code is longer, the intent is clearer. If the values of what is considered clean or contaminated change, it is now obvious which value to change. Even better, if we use these values later in the code, we now have to change them in only one place, and the rest of the code will work automatically.

Another lurking problem is making sure that these values are used in the correct place in the rect() call. This is solved by using call-by-name:

rect(xleft=0, ybottom=0, xright=CaClean, ytop=MgClean, border="blue") rect(xleft=CaContaminatedLower, ybottom=MgContaminatedLower, xright=CaContaminatedUpper, ytop=MgContaminatedUpper, border="red")

Now the code is substantially less error-prone, because it is clear that calcium is on the x-axis and magnesium is on the y-axis.

This same approach can be used for embedded strings, such as the names of the colors. It is also solved by defining the color names as objects whose names describe their function:

cleanColor <- "blue" contaminatedColor <- "red" rect(0, 0, CaClean, MgClean, border=cleanColor) rect(CaContaminatedLower, MgContaminatedLower, CaContaminatedUpper, MgContaminatedUpper, border=contaminatedColor)

As before, the only cost to this is somewhat longer code. The benefit is that the colors are clearly matched to the correct box. If you have multiple plots that use these same colors, the benefits grow: if you change the color in one place (where it is named), the changes propagate automatically through the rest of your code.

Name your constants, whether they are numbers or strings. Your code will become self-commenting and less error-prone. It is also future-proofed if you need to change any values.