Problem Sets

Return Home

Contact Us

Problem Set 1: Using R

Part 1

Use the seq() command to generate a vector of numbers from -30 to 90, in steps of 3, and assign it to an object named rVector.

In Excel, generate the same vector of numbers and export it as a CSV file. The file name should be xxxxVector.csv, where xxxx is your last name, lowercase (e.g., mine would be hollandVector.csv). Import this file into R and assign it to an object named excelVector.

Subtract the first vector from the second, assigning the results to a third vector named vectorDifference. This should create a vector of the same length as the first two, with the first elements subtracted from one another, the second elements subtracted from one another, and so on.

Since the first two vectors should be identical, vectorDifference should be all zeroes. Use range() to determine whether this is true. If not all values are zero, fix your work. Always check your work in every problem set, and fix it if it is incorrect.

Delete all three vectors in one command, as you will not need them anymore.

Part 2

Create a new vector that goes from -60 to 160 in steps of 5. Name that vector celsius.

Generate a vector named fahrenheit that shows the equivalent Fahrenheit temperature for each of the values in celsius. Calculate these values by doing your own arithmetic conversion; do not use any built-in conversion tool you might find for R.

In one line of code, find the value from the fahrenheit vector that corresponds to a celsius value of 100 degrees. Hint: write a logical test that finds which element of celsius is 100 degrees. Use that logical test to select the correct value of fahrenheit. Combine these into one simple line of code. This may take some experimentation, but you will want to ensure you know how to do this, as we will use this approach frequently.

In a one-sentence comment on the next line, state whether this value is the correct Fahrenheit temperature; be sure to state the two values in this sentence.

Do the same (code and comment), but for a celsius temperature of 0. Again, one line for the code and a one-sentence comment.

Part 3

Enter the following matrix directly into R (not via Excel) and name it myMatrix. Remember to always follow any comma with one space, just as you do in your writing.

5 6 7 7.8 7 2 8 1.3 9 6 17 6.0 7 7 1 2.5 6 4 8 6.8 2 3 2 7.7 3 4 5 3.4

Verify that the values in myMatrix are correct. Note that although it is laborious to enter your data twice and compare the results (as you did in part 1), doing so will almost guarantee that you will find any errors in data entry because you are unlikely to make the same mistake twice. It is important to find data-entry errors early because they can cause you to waste many hours analysis of the incorrect data. Do not include this double-entry when you turn in your work. When you write R code, you frequently do these checks but always discard them when you are sure that everything is working.

Use the colnames() function to assign column names to myMatrix, from left to right: snakes, spiders, birds, phosphorous. Pay attention to spelling and capitalization in these column names; use exactly what I specify. Use the help() function if you are uncertain how to use colnames(); the examples on these help pages are often particularly instructive.

Likewise, use the rownames() function to consecutively label the rows of myMatrix from A to G.

Display myMatrix to verify that it looks correct. In every problem set, if I ask you to display an object, include that line of code in your list of commands. Otherwise, delete lines of code where you have opted to display an object to check on it.

Part 4

Enter the same matrix into Excel, with the row and column labels as above. Save this file as a CSV file, with a name in the form of xxxxMatrix.csv, where xxxx is your last name, lowercase (e.g., mine would be hollandMatrix.csv). Import this file into R, and name it excelDF, DF being a common abbreviation for data frame. Note when you import the data that there are unique identifiers for each row.

Note that Excel sometimes adds extra rows causing you to get an error about duplicate row names. If this happens, fix the CSV file in a text editor and import it again. Likewise, Excel sometimes enters additional columns, which will look like commas at the end of every line; delete those if this happens.

Display excelDF for a quick check of correctness.

Part 5

We often will want to check the type of an object, for example, whether it is a vector, matrix, data frame, list, or function. For example, matrices and data frames are often interchangeable, but some functions require one type. In two lines of code and using the class() command, verify that myMatrix is a matrix and that excelDF is a data frame.

Likewise, we need a more robust way to evaluate whether the structure of the data is correct, particularly when importing data frames. In two lines of code, use the str() command to show the structure of myMatrix and excelDF. Notice the different types of information supplied by class() vs. str(). Use str() to verify that the number of rows and columns are correct, and that each variable has the right type.

Part 6

In one line of code, subtract excelDF from myMatrix and assign the result to an object called difference. Verify that all values are zero; this is a useful way of checking small matrices or vectors, but it is prone to errors when checking larger objects and should be avoided.

Use the appropriate function to determine whether difference is a matrix or data frame.

Use the range() function on difference to verify that all values equal zero.

Part 7

In one line of code and using column-number notation, add the snakes and spiders columns of excelDF, but do not assign the result to an object. Remember the space after every comma. Check the result to make sure it makes sense.

In one line of code, add the snakes and spiders columns using $ notation to access the columns of excelDF, but do not assign it to an object. The result should agree with the previous step.

Part 8

Using the simplest built-in function in R, find the standard deviation of the phosphorous value myMatrix. Do this in the simplest way that does not require phosphorous to be in column 4.

Do the same to find the mean phosphorous value in myMatrix.

Part 9

In one line of code using row-column notation, show the number of spiders in sample F of excelDF; use row and column numbers.

Show the number of spiders in sample F, but use row names and column names in row-column notation. Remember that the row and column names are treated as strings, so they go in quotes here (but not in the next step).

Finally, in one line of code, show the number of spiders in sample F using $ notation to get the column, then bracket notation to get the row by number.

Do those same three steps on myMatrix. You should find that the first two approaches work equally on matrices and data frames, but that the final approach (dollar-sign notation) does not work on matrices; it will produce an error (read it and remember it for when you encounter this later). Normally, you should not include a command that produces an error, but I want you to include this command here.

Submitting your problem set

For any work in R, you want to group into logical chunks (like paragraphs) and include a few comments to help the reader see the structure of the code. For this problem set, begin with a comment showing the part (e.g., Part 1, Part 2, etc.). On the next lines, include each line of code for that part, with no blank lines in between. Follow that block of code with one blank line, then repeat this for the next section. Follow this structure on every problem set.

Follow the instructions for how to format and submit your problem set. E-mail your commands file and the two correctly named .csv files you created to stratum@uga.edu. The subject of your email should be 8370 problem set 1. This problem set is due at 2:00 PM, Thursday, 31 August.

Data Analysis in the Geosciences

GEOL 8370