Lecture 10

Lecture 10


Outline


Inference for Regression


The Population and the Sample

But we know observations vary from sample to sample. So we imagine a true line that summarizes the relationship between x and y for the entire population,

\[\mu_y = \beta_0+\beta_1 x\]

Where $\mu_y$ is the population mean of $y$ at a given value of $x$.


The Population and the Sample (cont.)

For a given value x:

regmeans

\[y = \beta_0+\beta_1 x + \varepsilon\]

Regression Inference

\[\hat{y} = b_0 + b_1 x\]

where $b_0$ estimates $\beta_0$, $b_1$ estimates $\beta_1$.


Assumptions and Conditions

The inference methods are based on these assumptions:


Assumptions and Conditions (cont.)

Summary of Assumptions and Conditions:


The Standard Error of the Slope

\[SE(b_1) = \frac{s_e}{s_x\sqrt{n-1}}\]

where $s_e$ is spread around the line, $s_x$ is spread of $x$ values, $n$ is a sample size.


Example 1

Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population?

regse1


Example 2

Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population?

regse2


Example 3

Which of these scatterplots would give the more consistent regression slope estimate if we were to sample repeatedly from the underlying population?

regse3


A Test for the Regression Slope

When the conditions are met, the standardized estimated regression slope,

\[t = \frac{b_1 - \beta_1}{SE(b_1)}\]

follows a Student's t-model with $n - 2$ degrees of freedom. We calculate the standard error as SE (see above), where $s_e = \sqrt{\frac{\sum(y-\hat{y})^2}{n-2}}$ and $s_x$ is the standard deviation of the $x$-values.


A Test for the Regression Slope

\[t = \frac{b_1 - \beta_1}{SE(b_1)}\]

follows a Student's $t$-model with $n - 2$ degrees of freedom.


A Test for the Regression Slope

When the assumptions and conditions are met, we can find a confidence interval for $\beta_1$ from $b_1 \pm t^*_{n-2} * SE(b_1)$

where the critical value $t^*$ depends on the confidence level and has $n - 2$ degrees of freedom.


A Hypothesis Test for Correlation

What if we want to test whether the correlation between $x$ and $y$ is 0?

\[t = r \sqrt \frac {n-2}{1-r^2}\]

which follows a Student's $t$-model with $n - 2$ degrees of freedom.


The Confidence Interval for the Mean Response

When the conditions are met, we find the confidence interval for the mean response value $\mu_v$ at a value $x_v$ as

\[\hat{y}_v \pm t^*_{n-2} * SE\]

where the standard error is

\[SE(\hat{\mu}_v) = \sqrt{SE^2(b_1) \times (x_v - \bar{x}) + s^2_e/n }\]

The Prediction Interval for an Individual Response

When the conditions are met, we can find the prediction interval for all values of $y$ at a value $x_v$ as

\[\hat{y}_v \pm t^*_{n-2} * SE\]

where the standard error is

\[SE(\hat{y}_v) = \sqrt{SE^2(b_1) \times (x_v - \bar{x}) + s^2_e/n +s^2_e}\]