Multivariate Regression

Multivariate Regression


Introduction

For simple regression, the predicted value depends on only one predictor variable:

\[\hat{y} = b_0 + b_1 x\]

For multiple regression, we write the regression model with more predictor variables:

\[\hat{y} = b_0 + b_1 x_1 + \cdots +b_k x_k\]

Example 8

Home Price vs. Bedrooms, Saratoga Springs, NY. Random sample of 1057 homes. Can Bedrooms be used to predict Price?

mre1


Example 8 (cont.)

mre1


Example 8 (cont.)

mre1


Multivariate Linear Regression

A multivariate linear regression model assumes that the conditional expectation of a

$

E {Y | X^{(1)}=x^{(1)}, \ldots, X^{(k)}=x^{(k)}} = b_0 + b_1 x^{(1)} + \cdots +b_k x^{(k)}$

This model defines a $k$-dimensional regression plane in a $(k + 1)$-dimensional space of $(X^{(1)}, \ldots, X^{(k)}, Y)$.

The intercept $\beta_0$ is the expected response when all predictors equal zero.

Each regression slope $\beta_j$ is the expected change of the response $Y$ when the corresponding predictor $X^{(j)}$ changes by 1 while all the other predictors remain constant.


Interpreting Multiple Regression Coefficients

NOTE: The meaning of the coefficients in multiple regression can be subtly different than in simple regression.

Price = 28,986.10 - 7,483.10 Bedrooms + 93.84 Living Area


Interpreting Multiple Regression Coefficients (cont.)

In a multiple regression, each coefficient takes into account all the other predictor(s) in the model.

mre1


Interpreting Multiple Regression Coefficients (cont.)

So, what's the correct answer to the question:

Correct answer:

Summarizing: Multiple regression coefficients must be interpreted in terms of the other predictors in the model.


Example 9

On a typical night in New York City, about 25,000 people attend a Broadway show, paying an average price of more than 75 dollars per ticket. Data for most weeks of 2006-2008 consider the variables Paid Attendance, # Shows, Average Ticket Price(dollars) to predict Receipts. Consider the regression model for these variables.


Example 2 (cont.)

mre5

mre6


Example 2 (cont.)

Write the regression model for these variables.

Receipts = -18.32 + 0.076 Paid Attendance + 0.007 # Shows + 0.24 Average Ticket Price

Interpret the coefficient of Paid Attendance.

Estimate receipts when paid attendance was 200,000 customer attending 30 shows at an average ticket price of $70.

Is this likely to be a good prediction?


Assumptions and Conditions

Linearity Assumption

Independence Assumption

Equal Variance Assumption

Normality Assumption


Assumptions and Conditions (cont.)

Summary of Multiple Regression Model and Condition Checks:

  1. Check Linearity Condition with a scatterplot for each predictor. If necessary, consider data re-expression.

  2. If the Linearity Condition is satisfied, fit a multiple regression model to the data.

  3. Find the residuals and predicted values.

  4. Inspect a scatterplot of the residuals against the predicted values. Check for nonlinearity and non-uniform variation.

  5. Think about how the data were collected.

    • Do you expect the data to be independent?

    • Was suitable randomization utilized?

    • Are the data representative of a clearly identifiable population?

    • Is autocorrelation an issue?


Assumptions and Conditions (cont.)

  1. If the conditions check, feel free to interpret the regression model and use it for prediction.

  2. Check the Nearly Normal Condition by inspecting a residual distribution histogram and a Normal plot. If the sample size is large, the Normality is less important for inference. Watch for skewness and outliers.


Least Squares Estimation

According to the method of least squares, we need to find slopes $\beta_1, \ldots, \beta_k$ and an intercept $\beta_0$ such that they will minimize the sum of squared "errors"

\[Q = \sum^n_{i=1} ( y_i - \hat{y}_i)^2 = \sum^n_{i=1} (y_i - \beta_0 - \beta_1 x^{(1)}_i -\cdots - \beta_k x^{(k)}_i)^2\]

TO minimizing $Q$, we take partial derivatives of $Q$ with respect to all the unknown parameters and solve the resulting system of equations.


Matrix Approach to Multivariate Linear Regression

\[\mathbf{Y} = \begin{bmatrix} Y_1 \\ \vdots \\ Y_n \end{bmatrix}, \; \mathbf{X} = \begin{bmatrix} 1 & \mathbf{X}_1 \\ \vdots & \vdots \\ 1 & \mathbf{X}_n \end{bmatrix}, \; \mathbf{X} = \begin{bmatrix} 1 & X^{(1)}_1 & \cdots &X^{(k)}_1 \\ \vdots & \vdots & \vdots & \vdots \\ 1 & X^{(1)}_n & \cdots &X^{(k)}_n \end{bmatrix}\]

Then the multivariate regression model can be viewed as

\[E(\mathbf{Y}) = \mathbf{X}\beta\]

where $\beta = \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_k \end{bmatrix} \in \, \mathbb{R}^{k+1}$


Matrix Approach (cont.)

Our goal is to estimate $\beta$ with a vector of sample regression slopes $\mathbf{b} = \begin{bmatrix} b_0 \\ b_1 \\ \vdots \\ b_k \end{bmatrix}$

Fitted values will then be computed as

\[\mathbf{\hat{y}} = \begin{bmatrix} \hat{y}_1 \\ \vdots \\ \hat{y}_n \end{bmatrix} = \mathbf{X}\mathbf{b}\]

Matrix Approach (cont.)

Thus, the least squares problem reduces to minimizing

\[Q = \sum^n_{i=1} ( y_i - \hat{y}_i)^2 = (\mathbf{y} - \mathbf{\hat{y}})^T (\mathbf{y} - \mathbf{\hat{y}}) = (\mathbf{y} -\mathbf{Xb})^T(\mathbf{y} -\mathbf{Xb})\]
\[= \mathbf{b}^T(\mathbf{X}^T\mathbf{X})\mathbf{b} - 2\mathbf{y}^T\mathbf{Xb} + \mathbf{y}^T\mathbf{y}\]
\[\mathbf{b}=(\mathbf{X}^T\mathbf{X})^{-1}\mathbf{X}^T\mathbf{y}\]

Analysis of Variance

The total sum of squares is still

\[SS_{TOT} = \sum^n_{i=1}(y_i - \bar{y})^2 = (\mathbf{y} - \mathbf{\bar{y}})^T (\mathbf{y} - \mathbf{\bar{y}})\]

with $df_{TOT} = (n - 1)$ degrees of freedom, where

\[\mathbf{\bar{y}} = \begin{bmatrix} \bar{y} \\ \vdots \\ \bar{y} \end{bmatrix} = \bar{y} \begin{bmatrix} 1 \\ \vdots \\ 1 \end{bmatrix}\]

Analysis of Variance (cont.)

The regression sum of squares is

\[SS_{REG} = \sum^n_{i=1}(\hat{y}_i - \bar{y})^2 = (\mathbf{\hat{y}} - \mathbf{\bar{y}})^T (\mathbf{\hat{y}} - \mathbf{\bar{y}})\]

with $df_{REG} = k$ degrees of freedom.

The error sum of squares is

\[SS_{ERR} = \sum^n_{i=1}(y_i - \hat{y})^2 = (\mathbf{y} - \mathbf{\hat{y}})^T (\mathbf{y} - \mathbf{\hat{y}}) = \mathbf{e}^T\mathbf{e}\]

with $df_{ERR} = n - k -1$ degrees of freedom.


Residuals

\[R^2 = \frac{SS_{REG}}{SS_{ERR}}\]
\[F = \frac{SS_{REG}/k}{SS_{REG}/(n-k-1)}\]
\[s_e = MSE = \sqrt{\frac{\sum(y-\hat{y})^2}{n-k-1}}\]

Variance Estimator

For the inference about individual regression slopes $\beta_j$, we compute all the variances $Var(\beta_j)$.

\[VAR(\mathbf{b}) = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}\]

Diagonal elements of this $k \times k$ matrix are variances of individual regression slopes,

\[\sigma^2(b_i) = \sigma^2 (\mathbf{X}^T\mathbf{X})^{-1}_{ii}\]

Testing the Multiple Regression Model

The hypothesis for slope coefficients:

\[H_0: \beta_1 = \beta_2 = \cdots = \beta_k = 0\]
\[H_A: \text{at least one } \beta \neq 0\]

Test the hypothesis with an F-test (a generalization of the t-test to more than one predictor).


Testing the Multiple Regression Model (cont.)

The F-distribution has two degrees of freedom:

The F-test is one-sided, so bigger F-values mean smaller P-values.

If the null hypothesis is true, then F will be near 1.


Testing the Multiple Regression Model (cont.)

If a multiple regression F-test leads to a rejection of the null hypothesis, then check the t-test statistic for each coefficient:

\[t_{n-k-1} = \frac {b_j - 0}{SE(b_j)}\]

Note that the degrees of freedom for the t-test is $n - k - 1$.

Confidence interval:

\[b_j \pm t_{n-k-1}^* \times SE(b_j)\]

Testing the Multiple Regression Model (cont.)

In Multiple Regression, it looks like each $\beta_j$ tells us the effect of its associated predictor, $x_j$.

BUT


Example 10

On a typical night in New York City, about 25,000 people attend a Broadway show, paying an average price of more than 75 dollars per ticket. The variables Paid Attendance, # Shows, Average Ticket Price(dollars) to predict Receipts.


Example 10 (cont.)

State hypothesis for an F-test for the overall model.

\[H_0: \beta_1 = \beta_2 = \beta_3 = 0\]
\[H_A: \beta_1 \neq 0, \beta_2 \neq 0, \text{ or } \beta_3 \neq 0\]

State the test statistic and p-value.

mre7


Example 10 (cont.)

Since the F-ratio suggests that at least one variable is a useful predictor, determine which of the following variables contribute in the presence of the others.

mre8


ANOVA F-test

ANOVA F-test in multivariate regression tests significance of the entire model. The model is significant as long as at least one slope is not zero.

We compute the F-statistic

\[F = \frac{R^2/k}{(1-R^2)/(n-k-1)}\]

and check it against the F-distribution with $k$ and $(n - k - 1)$ degrees of freedom.

So, testing whether $F = 0$ is equivalent to testing whether $R^2 = 0$.


Adjusted R-square

Adding new predictor variables to a model never decreases $R^2$ and may increase it.

Adjusted $R^2$ imposes a "penalty" on the correlation strength of larger models, depreciating their $R^2$ values to account for an undesired increase in complexity:

\[R_{adj}^2 = 1 - (1-R^2)\frac{n-1}{n-k-1}\]

Adjusted $R^2$ permits a more equitable comparison between models of different sizes.