Include only those steps that are necessary to generate the answers to the problems. You will need to edit your commands down to what is needed.
Don’t display the values of vectors or data frames in your answers unless I specifically ask for them. This is particularly true for large data frames and long vectors. It is ok to show them for yourself as you work — and this is an important part of learning R to convince yourself that your code is working — but do not include them in the file you turn in or files that you share with others. If I ask you to display a vector or data frame, though, you need to show them.
If a function returns a vector, don’t wrap the result in c(). Likewise if you have just one object, do not wrap it with c(), as it is already a vector. For example, c() is unnecessary in cases like c(8:12), c(rnorm(25), c(seq(from=2, to=20, by=2), c(rep(25, 2)). If you are unsure if c() is needed, delete it and see if the code works correctly.
When accessing a subset of a data frame with a logical test, don’t wrap the logical test in a which() statement. The which() command is needed when you specifically need the index of the matching value. If you are unsure if which() is necessary, delete it and see if the code works correctly.
Use parentheses to simplify equations only when necessary. Remember and use the order of operations to simplify your code. If you are unsure of parentheses are needed, delete them and see if the code works correctly (you should be seeing a pattern here…).
Include only the comments that I ask for. Novice coders tend to include too many comments, particularly ones that state what code does in cases where the operation is clear. In your work, good comments should indicate your intent, the why of code, not the what.
Do not include the command setwd(), even if you comment it out. You will likely use setwd() when you work, but just delete it before sharing your code.
Do not embed a path in your code, such as calls to scan() and read.table(); it will automatically generate errors on anyone else’s computer.
Using spaces in our writing to make it more readable; good programmers do the same with their code. In particular, spaces help separate the elements of code, and a lack of spaces helps keep related elements adjacent. There are several places you should use spaces:
There are several places where you should not put a space:
Similarly, use blank lines to group related parts of your code. If several steps go together, do not separate them with blank lines. Instead, keep those lines of code together, but separate that block of code from preceding and following blocks of code by a single blank line. Multiple blank lines rarely help, and if you aren’t consistent about using them, they make your code look haphazard. One place to use multiple blank lines is if you are separating even larger-scale blocks of code (think of your code as being organized into sentences, paragraphs, chapters, and so on); just be consistent in the number of lines.
Use single quotes or use double quotes, but don’t switch between them in your code, because that makes the reader think that you are trying to convey something when you aren’t.
When writing equations, include only those parentheses that are necessary. Including too many overcomplicates your code and makes it more error-prone.
Misspelled words convey carelessness.
Similarly, check grammar and punctuation in your comments.
Use comments where necessary to identify the intent behind a block of code does, or to explain a critical or confusing step. Avoid commenting every line of code, or even most lines of code. Also, avoid commenting when the purpose is obvious; for example, if you are importing data from a file, you do not need to say that in a comment. In these problem sets, include a comment signaling every labeled part of the assignment (e.g., # Part 1)
Use blank lines to separate groups of related commands. It is hard to read code that lacks blank lines, and too many blank lines is just as hard. Think of blank lines in code as the paragraph breaks in your writing; they are there to help you read. You wouldn’t make every sentence in an essay its own paragraph, so don’t do the equivalent in code by surrounding every statement with a blank line. For the problem sets, treat each numbered part (e.g., “Part 1”) in the assignment as a block of code, and put one blank line before that block of code and one blank line after it.
Following these principles, your code should look like this:
# Part 1 someCommand anotherCommand aComment # Part 2 aCommand aComment anotherCommand # Part 3 ...
For your own work, you would use descriptive comments instead of Part 1, Part 2, etc., such as Read data sets, Cull the data, Fe vs. Mg plot, Fe vs. Mg regression analysis, etc.
The convention in R is to use <- for assignments at the beginning of a line rather than =, and we will adhere to that convention in this course. The only place you should use = for assignment is for assigning arguments in function calls. Here’s an example that illustrates both, as well as spacing around commas:
evenNumbers <- seq(from=0, to=100, by=2)