Problem Sets

Home

Contact

Steven Holland

Problem Set 4: Writing Functions

In this all subsequent problem sets, save every plot to a PDF file following the approach given in problem set two. The PDF file must be created in code, not through a menu in the R.app or RStudio. The file name for each plot will always be something like xxxxPlot1.pdf, where xxxx is your last name, lowercase. For example, plot 3 for me would be saved as hollandPlot3.pdf. Pay attention to the format, especially the capitalization and lack of hyphens and underscores. The plot number will be specified in each problem set. Use the default pdf() settings unless otherwise specified.

Never try to initially create a plot in a PDF file because you cannot see your progress as you issue multiple commands. Instead, first create the plot in a window. Once you are satisfied with your code, open a PDF file with pdf(), run those same commands to make the plot, then close the PDF file with dev.off(), like this:

pdf(file="hollandPlot5.pdf") # ... plotting commands dev.off()

Do not turn in the code for creating the plot in a window; only include the code for generating the plot in the PDF file. Do not turn in the PDF files; your code will generate them for me. Be sure to verify that your code correctly generates these PDF files: quit R, restart it, and run all your commands. Doing this will often reveal problems that you are not aware of; not doing this tells me you skipped this crucial step.

Part 1

Download the Nashville limestone geochemistry dataset from the 8370 home page (under Data, on the left side) and assign it to a data frame named limestone. Notice that we typically name a data frame for what it it represents, and short names are preferred. By convention, we generally give objects a lowercase name. Do not change the file name or modify the file’s contents in any way. Do not use attach() in this problem set.

Using the most appropriate command, display the structure of the data frame to verify that it was imported correctly.

Part 2

Prepare a scatterplot of Si vs. Ca with these specifications:

Part 3

You discover that you must construct this same Si vs. Ca plot for many other sites. You realize this is a good opportunity to write a function to construct this plot so that you can make each plot with a single line of code, a call to your function.

Read Writing Functions carefully. You will likely not be able to do the following without this.

Write your function; it will need to include the following four arguments and only these four arguments:

There are two other constraints on your function:

Be certain that the only objects the function uses are the four that are supplied as arguments or objects created within the function. Give the function a short but meaningful name of your choosing. Remember to indent the lines within the body of your code; one tab or four spaces would be fine. Either tabs or spaces is fine; I’m not a zealot about this. Precede and follow your function definition with one blank line.

Assign your data frame to a new object with a different name of your choosing. Delete the original data frame. Precede and follow these two lines of code with one blank line. Deleting the original object will help test whether your function is truly self-contained or using objects not passed in through the function call.

Call your function on this new data frame to produce a plot identical to your plot from Part 2, in one simple line of code. Use $ notation as necessary when assigning arguments to your function call. This is plot 2.

Part 4

Your function should work on any data set. To make sure this works, you should test your function. Good programmers always test their code, and you’ll learn three approaches in this section. If any of these tests fail to accomplish what they should, fix your function until these tests work correctly.

For the three tests, we will use simulated data. First, create a vector of Ca values using the rlnorm() function; create 100 values using the defaults for the distribution, multiply the random numbers by 50, and store them in an object called randomCa, all in one line of code. Do the same for Si, but multiply the values by 7, and store them in an object called randomSi, all in one line of code.

For the first test, call your custom plot on these data. Specify “brown3” for the color and “Watkinsville” for the locality name. Call all four arguments to your function by name, not position, and list the arguments in reverse order from how you defined your function. This is plot 3.

For the second test, use the same simulated data, and call your custom plot function with all arguments called by position, not name. This plot should be identical to the one from your first test. This is plot 4.

For the third test, again use the same simulated data. Call your custom plot function as you did in the second test, but do not specify the color or the locality name. Your plot should be identical to your second test, but it should lack a main title, and the points should all be in the default color. This is plot 5.

I will run a fourth test, calling your function on a data set you cannot see. Your plot should run correctly when I supply it values for calcium, silicon, and optionally color and locality name. You do not need to do anything for this test.

Submitting your problem set

When your code runs, it should create five pdf files. Make sure that your code names these PDF files correctly. Do not turn in the pdf files; your code will construct them when I run it.

Format your commands file following the standard instructions. E-mail your commands file to stratum@uga.edu. The subject of your email should be 8370 problem set 4. Do not send me the data file, as I have it already. This problem set is due on 16 September.