Problem Sets

Home

Contact

Steven Holland

Problem Set 4: Writing Functions

Important note about plotting in this and every subsequent problem set

Owing to the limitation of RStudio that prevents it from opening multiple plot windows, all problem sets from hereon will require you to save your plot to a pdf file following the approach given in problem set two. The file name for each plot will always be something like xxxxPlot1.pdf, where xxxx is your last name, lowercase. For example, plot 3 for me would be saved as hollandPlot3.pdf. Pay attention to the format, especially the capitalization and lack of hyphens and underscores. The plot number will be made clear in each problem set. Use the default pdf() settings unless otherwise specified.

Even so, do not initially create your plot in the pdf file because you cannot see your progress as you issue multiple commands. Instead, initially create your plot in a window. Once you are satisfied with your code, open the pdf file with pdf(), construct your plot, then close the pdf file with dev.off(), like this:

pdf(file="hollandPlot5.pdf") # ... plotting commands dev.off()

Do not turn in the code for creating the plot in a window, but do include the code for generating the plot in the pdf file. Do not turn in the pdf files; your code will generate them for me. Be sure to verify that your code correctly generates these pdf files.

Part 1

Download the Nashville limestone geochemistry dataset from the 8370 website (under Data, on the left side) and assign it to a data frame named nashville. Notice that we typically name a data frame for what it is, which will usually be similar to the file name. By convention, we generally give objects a lowercase name. Do not change the file name or modify the file’s contents in any way. Do not use attach() in this problem set because we want to build our comfort with dollar sign notation.

Run str() to verify that the data frame was imported correctly.

Part 2

Prepare a scatterplot of Si vs. Ca with these specifications:

Part 3

You discover that you must construct this same Si vs. Ca plot for many other sites. You realize this is a good opportunity to write a function to construct this plot so that you can make each plot with a single line of code, a call to your function.

Read Writing Functions carefully. You will likely not be able to do the following without this.

Write your function; it will need to accept the following arguments and only these arguments:

There are two other constraints on your function:

Be certain that the only objects the function uses are the four that are supplied as arguments or objects created within the function. Give the function a short but meaningful name of your choosing. Remember to indent the lines within the body of your code; one tab or four spaces would be fine. Either is fine; I’m not a zealot about this. Precede and follow your function definition with one blank line.

Assign your data frame to a new object with a different name of your choosing. Delete (remove) the original data frame. Precede and follow these two lines of code with one blank line. Doing this will help test whether your function is truly self-contained or using objects not passed in through the function call.

Call your function on this new data frame to produce a plot identical to your plot from Part 2, in one simple line of code. Use $ notation as necessary when assigning arguments to your function call.

Part 4

Your function should work on any data set. To make sure this works, you should test your function. Good programmers always test their code.

For the three tests, we will use simulated data. First, create a vector of Ca values using the rlnorm() function; create 100 values using the defaults for the distribution, multiply the random numbers by 50, and store them in an object called randomCa, all in one line of code. Do the same for Si, but multiply the values by 7, and store them in an object called randomSi, all in one line of code.

For the first test, call your custom plot on these data. Specify “brown3” for the color and “Watkinsville” for the locality name. Call all four arguments to your function by name, not position. This is plot 3.

For the second test, use the same simulated data, and call your custom plot function with all arguments called by position, not name. This plot should be identical to the one from your first test. This is plot 4.

For the third test, again use the same simulated data. Call your custom plot function as you did in the second test, but do not specify the color or the locality name. Your plot should be identical to your second test, but it shouldn’t have a main title, and the points should all be in the default color. This is plot 5.

I will run a fourth test, calling your function on a data set you cannot see. Your plot should run correctly when I supply it values for calcium, silicon, and optionally color and locality name. You do not need to do anything for this test.

Submitting your problem set

When your code runs, it should create five pdf files. Make sure that your code names these correctly. Do not turn in the pdf files; your code will construct them when I run it.

Format your commands file following the standard instructions. E-mail your commands file to stratum@uga.edu. The subject of your email should be 8370 problem set 4. Do not send me the data file, as I have it already. This problem set is due on 21 September.