As part of this course, you will analyze a data set of your own choosing. The project is intentionally open-ended, and it is up to you to decide what to analyze and how to do it. We have examined several statistical methods in class and will study multivariate data analysis techniques after the next exam. You can analyze your data set with methods we cover in class or new techniques that you learn on your own, with my advice.
Your completed project, due at the end of the semester, will be a 3–6 page report, plus figures and references. Your final report will be evaluated on five equally weighted criteria:
The interpretation of your results. Is there a clearly defined scientific question to which statistics can be fruitfully applied? What is the larger significance of your results?
The appropriateness of your statistical techniques. Did you use the right tools to analyze the data?
The correctness of your results. Did you perform the calculations correctly?
The ambition of the project. I view more complicated analyses involving new methods we haven’t covered and data sets collected specifically for this course more favorably than trivially narrow analyses of small, easily available data sets. Even so, do not design a large project specifically to make it big or complicated. Also, remember that this is a course project, not a dissertation.
The presentation of your results. Are your figures well-designed and prepared? Is your report intelligently structured? Is your text concise and well-composed?
The first step in your project is to write a proposal describing your scientific question, the data, and how you plan to analyze the data.
Your proposal is not graded, so don’t worry unduly about getting the methods perfectly correct. Many students need some discussion to settle on the methods, and I expect that. Even so, the better you do on your proposal and the more you have thought through your work, the smoother sailing you will have on your project.
Begin by finding a data set to analyze. Start on this now if you haven’t already. You might analyze data from your research, as second-year and more advanced students usually do. First-year students often analyze published data or data collected specifically for the course; this data is often closely related to what will become their research project. I am less concerned about the data source than the scientific problem you are addressing.
Once you have a data set, prepare a one-page proposal for your project. It should have the following sections, each beginning with a section heading (scientific problem, hypothesis, data, analysis):
1) Succinctly state the scientific problem to be solved. You should clearly state the broader scientific question your study will address: why the study is scientifically relevant (often called the what and the so-what). You may need to add some background information but get to the point. This section should set up the hypothesis.
2) Clearly state your hypothesis. A hypothesis is a one-sentence statement about the world that is testable; in other words, it could be shown to be false. If you are testing multiple hypotheses, list them as separate lines. Ensure your problem is scientifically important; do not pose a hypothesis simply because you can evaluate it statistically. The origin of each of your hypotheses should be clear from section #1. Each hypothesis should be specific about the variables; you will describe these in the next section. This and the statement of the scientific problem are the most important parts of the proposal, and you should establish these before you even begin to consider how to tackle the problem statistically.
3) Describe the data. State how many variables you have and what they are. State your sample size. For each of your variables, name what type they are (nominal, ordinal, etc.; open vs. closed; other considerations) and include a copy of the data, preferably as a text file. If your project examines only a portion of a larger data set, include only the data you will use. Label everything clearly. If you have not yet collected the data, show me the data set’s structure (the variables’ names, how many samples, etc.). The data file does not count toward the one-page limit. If your data set is large (i.e., >100 MB), submit a small portion that shows how the data is structured.
4) Outline your proposed analysis. Specify the tests or methods you will use and their rationale. State how you will justify the assumptions of your chosen tests. Describe the plots you will use to visualize the patterns. The steps in your analysis should be clearly connected to your hypotheses. Let’s talk if you think we have not discussed appropriate ways of examining your data.
Do not include a references section with the proposal, as I want you to focus on the scientific problem and how to analyze it statistically. You will include references in your final report, however.
Put your name at the top of your proposal. Follow standard report guidelines: single-spacing, minimum 1" margins, 12-point font, etc. Don’t use creative ways to pack more text onto a page; edit your writing to fit instead. Please name your proposal file xxxxProposal.pdf and your data file xxxxData.txt (or whatever suffix is appropriate), where xxxx is your last name. There is no need to resubmit if you have already sent your files.
Email me this one-page write-up and the data by 12:00 PM (noon), Monday, 30 October. I will return your write-ups promptly so you can begin work immediately. Be sure that the subject line is 8370 proposal. Please attach any files; do not send OneDrive, Dropbox, or similar links.