Lecture 8

Lecture 8


Outline


Comparing Two Means


Comparing Two Means (cont.)

As long as the two groups are independent, we find the standard deviation of the difference between the two sample means by adding their variances and then taking the square root:

\[SD(\hat{y}_1 - \hat{y}_2) = \sqrt{Var(\hat{y}_1) + Var(\hat{y}_2)}\]
\[=\sqrt{\left(\frac{\sigma_1}{\sqrt n_1}\right)^2 + \left(\frac{\sigma_2}{\sqrt n_2}\right)^2}\]
\[=\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\]

Comparing Two Means (cont.)

Usually we don't know the true standard deviations of the two groups, $\sigma_1$ and $\sgma_2$, so we substitute the estimates, $s_1$ and $s_2$, and find a standard error:

\[SE(\hat{y}_1 - \hat{y}_2) = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]

We'll use the standard error to see how big the difference really is.


A Sampling Distribution for the Difference Between Two Means

When the conditions are met, the standardized sample difference between the means of two independent groups,

\[t = \frac{ (\hat{y}_1 - \hat{y}_2) - (\mu_1 - \mu_2)} {SE(\hat{y}_1 - \hat{y}_2)}\]

can be modeled by a Student's $t$-model with a number of degrees found with a special formula.


The Two-Sample t-Test

Test hypothesis:

\[H_0 : \mu_1 - \mu_2 = \Delta_0\]

where the hypothesized difference $\Delta_0$ is almost always 0.

\[t = \frac{(\bar{y}_1-\bar{y}_2) - \Delta_0}{SE(\bar{y}_1 - \bar{y}_2)}\]
\[SE(\bar{y}_1 - \bar{y}_2) = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }\]

When the null hypothesis is true, the statistic can be closely modeled by a Student's t-model with a number of degrees of freedom given by a special formula.


Assumptions and Conditions


CI for the Difference Between Two Means

When the conditions are met, we are ready to find a two-sample $t$-interval for the difference between means of two independent groups, $\mu_1 - \mu_2$. The confidence interval is:

\[(\bar{y}_1 - \bar{y}_2) \pm t^*_{df} \times SE(\bar{y}_1 - \bar{y}_2)\]

Example 1

A market analyst wants to know if a new website is showing increased page views per visit. Given statistics below, find the estimated mean difference in page visits between the two websites.

Website 1Website 2
$n_1 = 80$$n_1 = 95$
$\hat{y}_1 = 7.7$ pages$\hat{y}_2 = 7.3$ pages
$s_1 = 4.6$ pages$s_1 = 4.3$ pages

Example 1 (cont.)

\[(\bar{y}_1 - \bar{y}_2) \pm t * \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }\]

where df = 163.59

\[= (7.7 - 7.3) \pm (1.9676) \sqrt{\frac{4.6^2}{80} + \frac{4.3^2}{95}}\]
\[= 0.4 \pm 1.338 = (-0.938, 1.738)\]

Fail to reject the null hypothesis. Since 0 is in the interval, it is a plausible value for the true difference in means.


Example 1 (cont.)

\[t = \frac{(\bar{y}_1-\bar{y}_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }}\]

where df = 163.59

\[=\frac{(7.7 - 7.3)}{\sqrt{\frac{4.6^2}{80} + \frac{4.3^2}{95}}} = \frac{0.4}{0.68}=0.588\]
\[P(t>0.588) = 0.2786\]

Fail to reject the null hypothesis. There is insufficient evidence to conclude a statistically significant mean difference in the number of webpage visits.


The Pooled t-Test

If we assume that the variances of the groups are equal (at least when the null hypothesis is true), then we can save some degrees of freedom.

To do that, we have to pool the two variances that we estimate from the groups into one common, or pooled, estimate of the variance:

\[s_{pooled} = \frac{(n_1-1)s^2_1 + (n_2-1)s^2_2}{(n_1-1) + (n_2-1)}\]

The Pooled t-Test (cont.)

Now we substitute the common pooled variance for each of the two variances in the standard error formula, making the pooled standard error formula simpler:

\[SE_{pooled}(\bar{y}_1-\bar{y}_2) = \sqrt{\frac{s^2_{pooled}}{n_1} + \frac{s^2_{pooled}}{n_2}} = s_{pooled}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\]

The formula for degrees of freedom for the Student's $t$-model is simpler, too.

\[\text{df} = (n_1-1) + (n_2-1)\]

The Pooled t-Test

For pooled $t$-methods, the Equal Variance Assumption need to be satisfied that the variances of the two populations from which the samples have been drawn are equal. That is, $\sigma_1 = \sigma_2$.

\[H_0: \mu_1 - \mu_2 = \Delta_0\]

where the hypothesized difference $\Delta_0$ is almost always 0, using the statistic

$

t = \frac{(\bar{y}_1-\bar{y}_2) - \Delta_0}{SE_{pooled}(\bar{y}_1-\bar{y}_2)}$


The Pooled t-Test Confidence Interval

The corresponding pooled-$t$ confidence interval is

\[(\bar{y}_1-\bar{y}_2) \pm t^*_{\text{df}} \times SE_{pooled}(\bar{y}_1-\bar{y}_2)\]

where the critical value $t^*$ depends on the confidence level and is found with $(n_1-1) + (n_2-1)$ degrees of freedom.