22 September 2009
When importing .csv files, single quotes (apostrophes) and double quotes can cause problems, since these are often used to enclose a string. For example, suppose you had a variable name or a string (text) value that had a comma in it. In a comma-delimited file, that string would be split into two strings. Quotes around it can force it to be read as a single string. Apostrophes may often be included in a variable name or sample name, but you do not want these to be treated as a string-delimiting symbol and you’ll need to take measures to prevent that.
If you have quote symbols in your csv file, you have two options. First, you could delete the quote symbols in your csv file. If you need to make sure a string is kept intact (such as when it contains a comma), deleting the quotes is not an acceptable option.
Alternatively, you can set the quote parameter in read.table() or read.csv() to do exactly what you want. In the default read.table() function, the quote parameter is quote = "\"'", which means that double quotes and single quotes will both be treated as string delimiters. The backslash in front of the double quotes is to say we mean the double-quote symbol and not the end of the parameter value. For read.csv(), the default quote parameter is quote="\"", which means that only double quotes will be used to delimit strings, not single quotes.
Because two of your sample names had apostrophes (single quotes), the read.table() function tried to include everything between those two as a single string. The read.csv() command ignored them only because of its default setting of the quote parameter.
The best solution is to set the quote parameter to do what you want. In the case of the homework, you want to ignore single quotes, so you should have used quote="\"". Some of you used quote="", which ignores all quotes. This works in this case, but there will be times when you need quotes to delimit strings, and quote="" will not do what you want.