Generate 50000 random numbers from a lognormal distribution with a meanlog of 0.5 and a sdlog of 0.25, and assign them to an object with a simple appropriate name. The Simple R Tutorial we use in class describes how to make random numbers from a normal distribution; as always, if a function doesn’t quite do what you want, consult the help pages first, particularly the section called “See Also”.
Plot a frequency distribution (histogram) of these data. Suggest 100 breaks, color the bars blue (by name), rotate the y-axis labels, and do not display a main title.
For the next plots, we will use Linda Kahmann’s paleosol data set. Paleosols are ancient (fossil) soils, and her data come from the Late Missisippian (323-328 Ma) Pennington Formation of eastern Kentucky. If you’re interested, you can read her paper. Download her data (kahmann.csv) from the Data page linked at the left. As always, leave the file name and its contents unchanged.
The first three columns are the soil order, soil drainage, and soil type, and the remaining columns are the elemental abundances of 23 elements. The elemental abundances are in ppm, except for Fe2O3, MnO, and Ti, which are expressed as weight %.
Examine the file in your text editor to see its structure. Import it into R, using one of the two commands covered in class. Name the object that you assign it to appropriately; the name should be short and descriptive.
Inspect the data in R using a single command that reports the class of the object (e.g., list, data frame, etc.), the number of cases (rows), the number of variables (columns), the type of each variable (e.g., numeric, character, logical), and the first few values for each variable. Use this output to verify for yourself that the data were correctly imported.
Let’s practice accessing particular rows and columns.
In one line of code and using bracket notation and numbers, display the first two columns for all rows. Note that row names are not considered a column. If you’ve imported the data correctly, your command will display the soilOrder and drainage columns.
Do the same, but use bracket notation plus the names of the columns.
Do this a third way, this time using dollar-sign notation. This will require two lines of code.
In one line of code, display the values of barium for the poorly drained paleosols. Use dollar-sign notation to get the barium column, and follow with a logical test to specify the rows where the drainage is poorly drained; you will need to use dollar-sign notation again to specify the drainage column. Check your results to see if they make sense. You will want to check your results this until you become comfortable with this approach of accessing values.
In one line of code, display the values of rubidium for the Vertisols, using dollar-sign notation and a logical test.
Using the name of the data frame and dollar-sign notation can become cumbersome, and extra typing raises the risk of errors. Since we will be using just this one data frame, we would save time if we could access the variables directly without using the name of the data frame and dollar-sign notation. Run the appropriate command for doing this.
We will make a new plot, but we want to make sure the old plot window stays open. Use the appropriate command for creating a new plot window (default size); the command you choose must work on any operating system.
In the plot window you just created, make a scatterplot of manganese oxide versus strontium in a single line of code, as follows:
We fill follow these same conventions on every plot: normal-language labels, rotated y-axis labels, solid circles for plot symbols, and adding a title only when it conveys information not conveyed by the axes. Also, as we write code, we will avoid making objects unless they simplify the code, make it more readable, or reduce redundancy.
We often want to save our plots directly in code, and the best way to do this is to make a PDF file, as they are editable in Adobe Illustrator and similar applications. Often we will create a plot this way and perform final touch-ups in Illustrator. Avoid using the uneditable bitmap formats like .jpg, .gif, .tiff, and .png.
Create a 6"x6" pdf file with the pdf() function, and recreate the plot you made in Part 4. Save this plot as xxxxScatter.pdf, where xxxx is your last name, lowercase (e.g., hollandScatter.pdf). Remember to close the pdf file when you are finished. All this will take three lines of code. Save yourself time by using the up arrow to re-run the plot command from Part 4, rather than typing it from scratch.
In the next plots, we will select the data using the same logical tests repeatedly. We can make our code easier to read and less error-prone if we save those logical tests and reuse them. For example, rather than write:
plot(Sr[pedotype=="Pine Mountain"], Hf[pedotype=="Pine Mountain"], xlab="Sr", ylab="Hf", las=1) plot(Ce[pedotype=="Pine Mountain"], Ge[pedotype=="Pine Mountain"], xlab="Sr", ylab="Hf", las=1) plot(Nb[pedotype=="Pine Mountain"], Th[pedotype=="Pine Mountain"], xlab="Sr", ylab="Hf", las=1)
We can instead write:
pineMtn <- pedotype=="Pine Mountain" plot(Sr[pineMtn], Hf[pineMtn], xlab="Sr", ylab="Hf", las=1) plot(Ce[pineMtn], Ge[pineMtn], xlab="Sr", ylab="Hf", las=1) plot(Nb[pineMtn], Th[pineMtn], xlab="Sr", ylab="Hf", las=1)
By doing this, we write the logical test once, get it right, then re-use it. This is a common technique for writing clean code, especially if you name your logical tests thoughtfully.
First, create in one line of code an object called poorlyDrained that stores a logical test (e.g., drainage=="poorly drained"). Do the same for an object called wellDrained.
Next, follow the same approach for three objects called mature, immature, and coal. The test for mature will be soil orders that are Vertisol or Oxisol. The test for immature will those that are Inceptisol or Entisol. The test for coal is those that are Histosol. Consult the Simple R Tutorial for how to specify AND and OR in logical tests.
BONUS +1: The label for iron oxide will be more stylish if we use subscripts. Create an object called feLab, and following the example on page 21 of the Simple R Tutorial, create a label for iron oxide that includes its units in parentheses. See the help page for plotmath for one way to juxtapose (place together without a space) two items, such as the parts of the iron oxide formula. This can also be done with the paste() command. paste() is useful; experiment with it.
Next we are going to make a grid of four plots. Read these instructions completely before starting.
First, create a new plot window of the default size. Next, use the mfrow argument for par() to create a 2x2 grid of plotting areas in this window.
Make the following four plots in the order listed:
Each plot will be made in three steps:
Note that we are doing four things with color here. First, we use colors that correspond to the actual objects, which helps people draw connections more easily. Second, we use desaturated colors (e.g., not bright colors) because they generally look more professional. Third, we use colors that contrast in their overall tone (darkness), as that helps distinguish them, especially important for people with color blindness. Last, we make the plotting symbols larger than normal because that helps distinguish colors. Follow these principles in all plots you make.
BONUS: +2. All four plots share a common coloring, so we can add a legend to one of them. Pick a plot that has an empty corner and add a legend using the legend() command. Check the help page to find any easy way to specify the position of the legend with a keyword, instead of the more tedious approach of specifying the x and y coordinates. You will need to supply two vectors to specify the labels and colors. Specify the plot character to match, and specify its size using the pt.cex argument. Turn off the box around the legend with the bty argument. These are general principles for any legend; in particular, the symbols on the legend must match those on the plot in color, shape, and size.
The code for each plot is repetitive, so it will be simpler and less error-prone if you write the code for one plot, get it working correctly, then copy and paste it for the other plots (with small edits for the variables and axis labels). The completed plot will be a lot of code, but it shouldn’t involve much typing. Each of your plot() commands should have a similar structure (i.e., arguments in the same order), as should each of your points() commands. Doing this makes error-checking much easier.
For clarity, put a blank line before and after each 3-line block of code used to make a plot (i.e., the call to plot() and the two calls to points()).
Before you move to the next question, carefully examine your plots and the code that underlies them for consistency and correct as necessary. As you correct them, you should copy and paste whole blocks of code from your text editor into R rather than running the commands individually.
Do exactly what you did in Part 7, except with a new scheme for coloring points:
Note that we are doing something else with color: by choosing colors that differ from our other plot, we guide our readers away from inadvertently thinking the plots show the same thing. Use color to make connections, and change the colors to prevent the wrong connections from being made.
Again, if you are smart about using your text editor and its copy/replace functionality, this can be done with very little typing. This is a common technique that greatly cuts down your typing errors.
Carefully examine your plots and their coding for consistency and correct as necessary.
We make plots to understand relationships and patterns in our data. Answer each of the following questions in order as a separate comment. Your answers should stand on their own, that is, they should make sense without knowing the question. Each answer should be succinct: covering the main points, but no more than 2–3 sentences long. As in all writing, pay attention to spelling, grammar, and punctuation. Use the names of the elements, not their symbols. These instructions will also apply to all future labs, although I may not repeat them explicitly.
Are any of the variables strongly related to one another?
Do any of the variables have strong outliers?
Do any of the variables display differences between well drained and poorly drained paleosols?
Do any of the variables indicate differences among mature, immature, and coal-associated paleosols?
Undo the command you performed in the first step of Part 4. Always remember to do this step, even if not reminded.
Format your commands file following the standard instructions.
Quit R, paste all your commands into R to confirm that they run without errors or warnings. Confirm that you have four open plot windows.
E-mail your commands file to stratum@uga.edu, following the standard instructions. The subject of your email should be 8370 problem set 2. This problem set is due 2 September at 2:00 PM.
Do not email the data file or the pdf file to me. I already have the data file, and your code will generate the pdf file.