Analytic Rarefaction 1.3
October 2003
Steven M. Holland
You are free to use this program as you wish. All I ask is that if you use it for a published paper or talk, please acknowledge me in your paper and mention in your acknowledgments that the program is available from this web site.
Running the program
Rarefaction is used to estimate diversity (and place confidence limits on diversity) if sample size had been lower than it actually was. In ecology or paleoecology, one might count all of the individuals in a sampling area and assign each of those individuals to a species. For that sample, one would then have a tally of the number of individuals found for each species:
A 13
B 45
C 2
D 1
E 18
F 99
G 174
To estimate the number of species that would have been found had a smaller number of individuals been counted, one could use rarefaction. The data file would be set up simply as a list of the numbers of individuals for each species:
13
45
2
1
18
99
174
Note that this list does not specify which abundance corresponds with which species. Also, note that this list does not have to be in any particular order. For example, if one rearranged the list and placed the species in order of abundance, one would get exactly the same results from the rarefaction program.
To use Analytic Rarefaction, generate this list of species abundances as a text file. On a Mac, TextEdit or SimpleText are probably the easiest programs to use; NotePad is fine for Windows (but see Error Messages below). The data file should consist of a vertical string of integers, with each value representing the abundance of a given species. Be sure not to enter real (decimal) values of abundance; zeroes do not appear to be a problem. The abundances do not need to be ordered or ranked in any way.
Save your file and place it in the same folder as the Analytic Rarefaction program. On a Macintosh, save the file as rarefaction.dat; On Windows/DOS, save the file as rarefact.txt.
Double-click on the Analytic Rarefaction icon. The program will automatically read the data, perform the rarefaction calculations, and write the results both to the screen and to the file rarefaction.res.
You will be asked to enter how frequently you wish the program to make the rarefaction calculations, such as, in increments of 10 specimens.
When you quit the program, you will asked if you want to save the file. You do not need to save the file as it already has been saved as rarefaction.res. NOTE for Windows/DOS users: the results will be saved in a file named rareres.txt.
Viewing the results
Examine the screen output from the program. The first item listed is the number of individuals and the number of species read. The number of individuals should equal the sum of all of the values in the file rarefaction.dat and the number of species should be the number of values entered in rarefaction.dat. Check these values to make sure all of your data has been read.
The program will then list for a range of rarified sample sizes (n) the expected number of species (E) and the variance of the expected number of species (Var). It will also list the upper and lower 95% and 99% confidence limits. The upper and lower 95% confidence limits are calculated as E +/- 1.96 * sqrt(Var); the 99% confidence limits are calculated as E +/- 2.58 * sqrt(Var).
Although Analytic Rarefaction does not plot the results, the file rarefaction.res can be read into any graphing program such as Excel, Deltagraph, or Kaleidagraph.
Method of calculation
The program uses the rarefaction equations for E given by Hurlbert (1971) and for Var given by Heck et al. (1975). These are the same equations used by Raup (1975) and Tipper (1979). In particular, this program uses the formulation of Tipper (1979), as his equations (1) and (2) are easy to program and avoid the overflow errors associated with the large combinatorials.
The results of this program have been cross-checked with analytic solutions supplied by Mike Foote and with resampling calculations by myself and Mike Foote. Note that if you test the results of this program by using Table 3 of Raup (1975), the values of Var will differ for low values of n. Dave Raup, Michael Foote, and I have found a coding error used in Raup's original program causing his published values of Var to be inflated at low values of n.
Error Messages
ERROR - could not locate the data file 'rarefaction.dat'
Their are two likely causes of this message: (1) the rarefaction.dat file is not in the same folder or directory as the rarefaction program, and (2) the rarefaction.dat file was actually saved under a different name. In the first case, verify that the data file and the program are indeed in the same directory or folder. In the second case, verify that the data file is truly named "rarefaction.dat".
ERROR - could not locate the data file 'rarefact.txt'
Same as previous message, except on Windows/DOS systems. Their are two likely causes of this message: (1) the rarefact.txt file is not in the same folder or directory as the rarefaction program, and (2) the rarefact.txt file was actually saved under a different name. In the first case, verify that the data file and the program are indeed in the same directory or folder. In the second case, verify that the data file is truly named "rarefact.txt".
Insufficient memory
This error can occur for large problems on a Macintosh. If you see this message, quit the program, select the application icon in the finder, and select File... Get Info... Memory. Try doubling the number in the Preferred Size box, which should solve the problem. Larger increases in preferred size may be needed for extraordinarily large problems.
Changes from version 1.1
Now able to handle more than 5000 individuals (the limit in version 1.1); the upper limit is set by the numeric capacity of the computer rather than by a fixed cutoff in the program. On a Power Macintosh 8600, I have been able to run rarefactions with over 100,000 individuals.
Now performs rarefaction calculations at user-specified intervals, rather than at program-defined intervals.
Version 1.2 makes the minimum number of calculations necessary, so the program is slightly faster than version 1.1 for small data sets.
Version 1.3 allows the user to specify the name of the data file.
References
Heck, K.L., Jr., G. Van Belle, and D. Simberloff, 1975. Explicit calculation of the rarefaction diversity measurement and the determination of sufficient sample size. Ecology 56:1459–1461.
Hurlbert, S.H., 1971. The nonconcept of species diversity: a critique and alternative parameters. Ecology 52: 577–586.
Raup, D.M., 1975. Taxonomic diversity estimation using rarefaction. Paleobiology 1: 333–342.
Tipper, J.C., 1979. Rarefaction and rarefiction - the use and abuse of a method in paleontology. Paleobiology 5: 423–434.