Linear regression is used for predicting a quantitative response Y on the basis of asingle or multiple predictor variables X. It is a well-known and very popular methodto model the statistical relationship between the response and explanatory variable.When modeling this relationship the values of the explanatory variables are knownand they are used to describe the response variables as best as possible. A propermodel that accurately captures the relationship between these variables can be usedto make predictions on data that has not been used to build the model. Simple linearregression is a model that contains only one explanatory variable. The formula ofsimple linear regression is:Y = ?0 + ?1 × X + e,whereY is the response variable,X represents the predictor variable,?0 is called the overall intercept or the overall population mean of the responsevariable,?1 is the average effect on Y of a one unit increase in X, holding all other predictorsfixed. It is also called the slope term,e is the error term that represents the deviation between the observed and predictedvalues.
?0 and ?1 are the unknown constants that the model has to estimate based on thedata James et al., 2013.The goal is to obtain coefficient estimates of ?ˆ0 and ?ˆ1 such that the linear modelfits the available data well, so that yˆi ? ?ˆ0 + ?ˆ1 × xifor i = 1, . . . , n, where yˆiindicatesthe prediction for every i observation. In other words, the aim is to find anintercept ?ˆ0 and slope ?ˆ1 such that the resulting line is as close as possible to thedata points. The most common approach to do this involves minimizing the leastsquares criterion (see section 2.
3.1).Several assumptions are made in linear models: the residuals are independent,the residuals are normally distributed, the residuals have a mean of 0 at all values ofX, the residuals have constant variance, the model is linear in the parameters. Whenapplying linear models it is of importance to make sure that these assumptions aremet, otherwise the statistical inference based on these results may not be adequate.2 Chapter 1. Theoretical backgroundA variance-covariance matrix is a way to better demonstrate homogeneous varianceand independence of the residuals:V = cov =??????2 0 .
. . 00 ?2. . . 0.
.0 0 . .
. ?2?????In the variance-covariance matrix the diagonal values are the variances and if theyare all the same then this is a representation of variance homogeneity. The zeros inmatrix represent that there is no correlation, dependence between residuals.Although the standard linear model assumes independent errors and heteroscedasticvariance among other assumptions, data cannot always satisfy these assumptionsso therefore more complex algorithms are also available.
In fact it is possible to addthe correct variance or correlation structure to the model in R. Next, we will introducethe mathematical notations of different variance and correlation structures andhow to include them in linear models.