For a binomial distribution, variance is a function of the mean, reaching a maximum value at a proportion of 0.5, and declining to zero at proportions of zero and one. Variance-stabilizing transformations are used to correct this problem in binomial data, and two of the most common variance-stabilizing transformations are the logit and arcsine transformations. These transformations are also used for percentage data that may not follow a binomial distribution.

The logit transformation is the log of the odds ratio, that is, the log of the proportion divided by one minus the proportion. The base of the logarithm isn’t critical, and e is a common base.

logitTransform <- function(p) { log(p/(1-p)) }

The effect of the logit transformation is primarily to pull out the ends of the distribution. Over a broad range of intermediate values of the proportion (p), the relationship of logit(p) and p is nearly linear. One way to think of this is that the logit transformation expands the ends of the scale, such that small differences in p (say, going from 0.98 to 0.99) have a larger difference on the logit scale.

p <- seq(0.001, 0.999, 0.001)
pLogit <- logitTransform(p)
plot(p, pLogit, type='l', lwd=2, col='red', las=1, xlab='p', ylab='logit(p)')

The arcsine transformation (also called the arcsine square root transformation, or the angular transformation) is calculated as two times the arcsine of the square root of the proportion. In some cases, the result is not multiplied by two (Sokal and Rohlf 1995). Multiplying by two makes the arcsine scale go from zero to pi; not multiplying by two makes the scale stop at pi/2. The choice is arbitrary.

asinTransform <- function(p) { asin(sqrt(p)) }

The effect of the arcsine transformation is similar to the logit, in that it pulls out the ends of the distribution, but not to the extent that the logit does.

pAsin <- asinTransform(p)
plot(p, pAsin, type='l', lwd=2, col='blue', las=1, xlab='p', ylab='arcsine(p)')

To gauge the different effects of these two transformations, its helpful to show their transformations on the same scale. To do this, I will scale the transformations to have the same range.

rangeScale <- function(x) { (x-min(x)) / (max(x)-min(x)) }

pAsin.scaled <- rangeScale(pAsin)
pLogit.scaled <- rangeScale(pLogit)

plot(p, pAsin.scaled, las=1, type='l', lwd=2, col='blue', xlab='p', ylab='p transformed')
points(p, pLogit.scaled, type='l', lwd=2, col='red')
text(0.8, 0.8, 'asin', col='blue')
text(0.8, 0.5, 'logit', col='red')

Both transformations are essentially linear over the range of 0.3–0.7, with more curvature near the ends. The curvature of the logit transformation is much more pronounced, so the logit transformation has a much stronger effect than the arcsine transformation.

For regression, the logit transformation is preferred for three reasons (Warton and Hui 2011). First, the logit scale covers all of the real numbers instead of being limited to a particular range. For example, just as proportion is limited to 0–1, the arcsine square root scale is limited to 0 to pi. In constrast, the limits of the logit scale are negative infinity and positive infinity. This is particularly important where prediction is needed, as having a bounded scale could give nonsensical results (e.g., more than 100% or less than 0%). Second, the logit scale is more intuitive in that it is the log-odds. This is particularly useful in interpreting slopes from a logistic regression, in which the logit transformation is central. Third, the logit scale correctly models the relationship between the mean and variance in binomial data, where variance is p(1-p)/n.

In multivariate studies, like ordination or cluster analysis, the arcsine transformation is preferred. For ecological data, proportions of 0% are common, such as when a species doesn’t occur in a sample. Values of 100% are also possible, such as when only a single species is present in a sample. In these cases, the range of the logit scale becomes a problem because values of negative infinity will occur whenever a species is absent from a sample and values of positive infinity will be arise in any monospecific sample. Although one could add a small value to prevent a zero proportion or subtract a small value to prevent a proportion of one, such values are arbitrary and the effect of the chosen value would have to be evaluated. An arcsine square root transformation would be more straightforward for these types of problems. Finally, because both transformations are essentially linear over the range of 0.3–0.7, neither transformation is necessary if all of your data falls in this range.

Sokal, R.R., and F.J. Rohlf. 1995. Biometry. Freeman, New York, 887 p.

Warton, D.I., and F.K.C. Hui. 2011. The arcsine is asinine: the analysis of proportions in ecology. Ecology 92:3–10.

Comments or questions? Contact me at stratum@uga.edu