Problem Sets

Home

Contact

Problem Set 4: Writing Functions

Important note about plotting in this and every subsequent problem set

Owing to the limitation of RStudio that prevents it from opening multiple plot windows, all problem sets from hereon will require you to save your plot to a pdf file following the approach given in problem set two. The file name for each plot will always be something like xxxxPlot1.pdf, where xxxx is your last name, lowercase. For example, plot 3 for me would be saved as hollandPlot3.pdf. Pay attention to the format, especially the capitalization and lack of hyphens and underscores. The plot number will be made clear in each problem set. Use the default pdf() settings unless otherwise specified.

Even so, do not initially create your plot in the pdf file because you cannot see your progress as you issue multiple commands. Instead, initially create your plot in a window. Once you are satisfied with your code, open the pdf file with pdf(), construct your plot, then close the pdf file with dev.off(), like this:

pdf(file="hollandPlot5.pdf") # ... plotting commands dev.off()

Do not turn in the code for creating the plot in a window, but do include the code for generating the plot in the pdf file. Do not turn in the pdf files; your code will generate them for me. Be sure to verify that your code correctly generates these pdf files.

Part 1

Download the Nashville limestone geochemistry dataset from the 8370 website (under Data, on the left side) and assign it to a data frame named nashville. Notice that we typically name a data frame for what it is, which will usually be similar to the file name. By convention, we generally give objects a lowercase name. Do not change the file name or modify the file’s contents in any way. Do not use attach() in this problem set because we want to build our comfort with dollar sign notation.

Run str() to verify that the data frame was imported correctly.

Part 2

Prepare a scatterplot of Si vs. Ca with these specifications:

Ca and Si should be on the correct axes. Remember what “Si vs. Ca” implies about which variable goes on which axis.
The plot symbols should be solid (filled) blue circles; use the named color “dodgerblue”.
The axes should have meaningful names. Put the units (ppt) in parentheses after the element name.
Rotate the values along the y-axis so that they are horizontal.
The minimum values on both axes should be zero, and the maximum values should encompass the largest value of that variable. The maximum should not be hardcoded (i.e., it should be calculated), but the minimum must be.
The main label for the plot should be the locality name, Hollis Creek. Always spell carefully; incorrect spelling on a plot will be treated as an error.
Add a rectangle from 0.0–30.0 on the Si axis and 0.0–220.0 on the Ca axis. You will find it helpful to define short well-named constants for each of these. The rectangle’s fill should be light gray; use the named color “lightgray”. The rectangle’s border should be dark gray (“darkgray”), with the default width and line type.
Calculate the vertical and horizontal center of the rectangle (based on the constants you defined above), and store these in well-named objects with names that aren’t too long. Use these to place a text label above this point, with the text “acceptable limits”. See the help page for text() for how to ensure that text appears above a specified point. The text should be 75% of the default size.
Your commands should be called in the correct order so that the rectangle does not obscure the data points. It is not a problem here if a label partially overlaps data points. Be sure that points are added only once to the plot, that is after the rectangle is drawn.
Your commands should not contain unused arguments or arguments set to default values. Remove unnecessary commands. This is plot 1.

Part 3

You discover that you must construct this same Si vs. Ca plot for many other sites. You realize this is a good opportunity to write a function to construct this plot so that you can make each plot with a single line of code, a call to your function.

Read Writing Functions carefully. You will likely not be able to do the following without this.

Write your function; it will need to accept the following arguments and only these arguments:

A vector of Ca (calcium) values
A vector of Si (silicon) values
The locality name
A color for the points

There are two other constraints on your function:

The locality name should have a default value of an empty string ("").
The color of the plotting points should default to "darkgreen".

Be certain that the only objects the function uses are the four that are supplied as arguments or objects created within the function. Give the function a short but meaningful name of your choosing. Remember to indent the lines within the body of your code; one tab or four spaces would be fine. Either is fine; I’m not a zealot about this. Precede and follow your function definition with one blank line.

Assign your data frame to a new object with a different name of your choosing. Delete (remove) the original data frame. Precede and follow these two lines of code with one blank line. Doing this will help test whether your function is truly self-contained or using objects not passed in through the function call.

Call your function on this new data frame to produce a plot identical to your plot from Part 2, in one simple line of code. Use $ notation as necessary when assigning arguments to your function call.

Part 4

Your function should work on any data set. To make sure this works, you should test your function. Good programmers always test their code.

For the three tests, we will use simulated data. First, create a vector of Ca values using the rlnorm() function; create 100 values using the defaults for the distribution, multiply the random numbers by 50, and store them in an object called randomCa, all in one line of code. Do the same for Si, but multiply the values by 7, and store them in an object called randomSi, all in one line of code.

For the first test, call your custom plot on these data. Specify “brown3” for the color and “Watkinsville” for the locality name. Call all four arguments to your function by name, not position. This is plot 3.

For the second test, use the same simulated data, and call your custom plot function with all arguments called by position, not name. This plot should be identical to the one from your first test. This is plot 4.

For the third test, again use the same simulated data. Call your custom plot function as you did in the second test, but do not specify the color or the locality name. Your plot should be identical to your second test, but it shouldn’t have a main title, and the points should all be in the default color. This is plot 5.

I will run a fourth test, calling your function on a data set you cannot see. Your plot should run correctly when I supply it values for calcium, silicon, and optionally color and locality name. You do not need to do anything for this test.

Submitting your problem set

When your code runs, it should create five pdf files. Make sure that your code names these correctly. Do not turn in the pdf files; your code will construct them when I run it.

Format your commands file following the standard instructions. E-mail your commands file to stratum@uga.edu. The subject of your email should be 8370 problem set 4. Do not send me the data file, as I have it already. This problem set is due on 21 September.

Data Analysis in the Geosciences

GEOL 8370