Problem Sets

Home

Contact

Problem Set 8: Bootstrap

Suppose you have been studying surface groundwater, and you have learned that the ratio of sodium plus potassium to calcium plus magnesium is a useful for inferring the geology of the water source.

alkali ratio equation

Ratios like this have an interesting but undesirable property. When Na+K equals Ca+Mg, the ratio is 1. As Na+K becomes smaller than Ca+Mg, the ratio approaches zero but never gets larger. As Na+K becomes larger than Ca+Mg, the ratio can grow to infinity. In other words, one side of the distribution is limited to the 0–1 interval, but the other side ranges from 1–∞! As result, as a result, the distribution of this ratio is necessarily lognormal with a long right tail.

The usual way to fix this is to take the log of the ratio. The distribution now will approach normality, or at least symmetry. A value of zero means Na+K equals Ca+Mg, positive values means there is more Na+K, and negative values mean there is more Ca+Mg. We will use the base-10 log in this problem set, and we will call this ratio AR (for alkali ratio).

log of alkali ratio equation

Part 1

Write a function called AR to perform this calculation. It should take four arguments called Na, K, Ca, and Mg. None should have default values. It should return the base-10 logarithm of the ratio as shown above.

Part 2

We will test this on some cases so that we know the output is correct. You will want to check these values with a calculator, if necessary. For each of these test, you will call the arguments by name, with hard-coded values. Although we normally avoid doing that, here it will make the connection between the inputs and the value more obvious.

First, let’s cover the case where Na+K equals Ca+Mg. Call the AR function where Na, K, Ca, and Mg all equal 10.

Second, let's cover the case where Na+K is ten times greater than Ca+Mg. Call the AR function where Na and K are each 100 and Ca and Mg are both 10.

Third, let's cover the reverse case, where Ca+Mg is ten times greater than Na+K. Call the AR function where Na and K are both 10, and Ca and Mg are both 100.

Question 1: Does your function work correctly in these test cases? Explain within the normal length constraints of a question.

Part 3

We will apply this function to surface water data from the Uzon Caldera in Kamchatka, Russia. Download and import the data set uzon.txt, naming the data frame uzon.

Use the appropriate command to view the structure of this data frame.

Use the appropriate command that will allow you to call the variables by name without using dollar-sign notation.

Part 4

Apply the AR function to the values of Na, K, Ca, and Mg, calling the arguments by name so that you are sure you are calling it correctly. Save the results to an object called alkaliRatio.

To visualize the distribution, run stripchart on alkaliRatio. Use a print character of 3 (a plus sign), which works well when many values are similar. Give the x-axis a short descriptive name (not the object name). This is plot 1.

Part 5

We want to calculate the mean of alkaliRatio with a 95% confidence interval. Use t.test to do this.

Question 2: Report the mean and confidence interval as it is usually done (see Stating Statistical Results).

Part 6

You are concerned that the standard assumptions of a t-test do not apply, particularly that the sample size and distribution of the statistic do not ensure that the Central Limit Theorem applies. You decide to bootstrap the confidence intervals, since it does hinge on these assumptions.

Following the example from the lecture notes on the bootstrap, bootstrap your AR() function. Be sure to calculate the mean of the alkali ratios for each of your observations; in your bootstrap function, that will mean wrapping the mean() function around your AR() function. Your bootstrap should have 100,000 replicates. From your bootstrap, calculate an estimate of the alkali ratio for the population along with a 95% confidence interval on that estimate.

Display the estimate and the upper and lower confidence interval in one line of code, with an appropriate number of significant figures. Hint: use c() and round().

Question 3: based on your bootstrap, state the mean AR, with the confidence interval in parentheses, as usual, using a reasonable number of significant figures. State whether it is significantly different from a null hypothesis of zero (no parenthetical numerical support is needed, because you will have given that in the previous sentence).

Question 4: How does the mean alkali ratio compare for the t-test approach versus the bootstrap?

Question 5: How does the 95% confidence interval on the mean alkali ratio compare for the t-test approach versus the bootstrap?

Question 6: Based on the assumptions underlying the t-test and the bootstrap, which confidence interval should you use?

Submitting your problem set

Format your commands file following the standard instructions. E-mail your commands file to stratum@uga.edu. The subject of your email should be 8370 problem set 8. Do not send me the data file, as I have it already. This problem set is due 12:00 PM (noon), Thusday, 9 November.

Data Analysis in the Geosciences

GEOL 8370