A simple test of R Markdown and knitr

Simulate data that have a linear relationship.

First, it's good form to specify any constants, rather than embedding numbers in the code, so we'll specify the sample size (n), slope, and intercept, mean of x, and standard deviation of x explicitly.

Next, we generate the x-values from a normal distribution with a specified mean and standard deviation.

To have scatter around the line, we need to specify errors (deviations) for each of our x points. Again, we specify the mean and standard deviation of our errors explicitly, which makes our code self-commenting. The goal here is not to have any unexplained constants appear in our code.

Last, we generate our y-values as being our x-values multiplied by the slope, with the intercept and errors added to them.

n <- 50
slope <- 3.1
intercept <- 7.2

xMean <- 40
xSd <- 20
x <- rnorm(n, xMean, xSd)

errorMean <- 0
errorSd <- 20
errors <- rnorm(n, errorMean, errorSd)

y <- slope * x + intercept + errors

Plot the data.

This will be a simple bivariate plot. The symbols will be set to small black circles (pch=16) to make them easier to see. The y-axis labels will also be rotated (las=1) to make them easier to read.

plot(x, y, pch=16, las=1)

plot of chunk dataPlot

Perform a regression.

Regressions are done using the lm() function. The syntax of 'y~x' should be read as 'y as a function of x'.

myRegression <- lm(y~x)

View a simple summary of the regression by displaying the regression object you created.

myRegression
##
## Call:
## lm(formula = y ~ x)
##
## Coefficients:
## (Intercept)            x
##       1.769        3.195

The regression object is a list with several named elements. To see what these are, use names().

names(myRegression)
##  [1] "coefficients"  "residuals"     "effects"       "rank"
##  [5] "fitted.values" "assign"        "qr"            "df.residual"
##  [9] "xlevels"       "call"          "terms"         "model"

To view any of these, use $ notation. For example, you can display the regression coefficients.

myRegression$coefficients
## (Intercept)           x
##    1.768959    3.194754

Our calculated slope (3.2) is close to what we specified (3.1), but our calculated intercept (1.8) is rather different from what we set (7.2), which is not unusual when most of the data is far from the y-axis.

Use abline() to add the line of regression to an existing plot.

plot(x, y, pch=16, las=1)
abline(myRegression, lwd=2, col='blue')

plot of chunk regressionLine