Statistics

Fitting Linear Regression Models Part 2

Part One suggest a process that may be adopted for model selection (with choosing regressors through a number of criteria) and model adequacy (with residual analysis, diagnostic checks and multicollinearity for validity of assumptions).

Part Two will be handling fitting General Linear Model by Least-Squares Estimation through the use of matrix algebra.

Fitting the Linear Model: Matrices

Recall, we have linear regression in a general form.

$$ { Y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_k x_k} +\epsilon $$

and we make n independent observations, y1,y2,y3 ... yn, on Y. We can write as yi observations as,

$$ { y_i = \beta_0 + \beta_1 {x_i}_1 + \beta_2 {x_i}_2 + ... + \beta_k {x_i}_k} +\epsilon_i $$

where xij is the set of the jith independent variable for th ith observation, i = 1,2,.., n. Defined as matrix, with x0 =1.

$$ \left[\begin{matrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{matrix} \right] = \left[\begin{matrix} 1 & {x_1}_1 & ... & {x_1}_k \\ 1 & {x_2}_1 & ... & {x_2}_k \\ \vdots & \vdots & \ddots & \vdots \\ 1 & {x_n}_1 & ... & {x_n}_k \end{matrix} \right] \left[\begin{matrix} \beta_1 \\ \beta_2 \\ \vdots \\ \beta_k \end{matrix} \right] + \left[\begin{matrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{matrix} \right] $$

Thus with matrix operations, the n equations representing yi as a functions of x's, beta's, and epsilon's can be simultaneously written as

$$ { Y = X\beta +\epsilon } $$

Suppose we have a simple linear model with n observations. $$ { Y = \beta_0 + \beta_1 x +\epsilon } $$ $$ Y=\left[\begin{matrix} y_1 \\ y_2 \\ \vdots \\ y_n \end{matrix} \right] , X =\left[\begin{matrix} 1 & x_1 \\ 1 & x_2 \\ \vdots & \vdots \\ 1 & {x_n} \end{matrix} \right] , \epsilon = \left[\begin{matrix} \epsilon_1 \\ \epsilon_2 \\ \vdots \\ \epsilon_n \end{matrix} \right] , \beta = \left[\begin{matrix} \beta0 \\ \beta_1 \end{matrix} \right] $$

The least square equations for betas (coefficients) given by,

$$ { n \hat\beta_0 = \hat\beta_1 \sum_{i=1}^n x_i = \sum_{i=1}^n y_i }, $$ $$ { \hat\beta_0 \sum_{i=1}^n x_i + \hat\beta_1 \sum_{i=1}^n x^2_i = \sum_{i=1}^n x_iy_i }. $$

TO BE CONTINUED...