Lecture 8
Outline
Comparing Two Means
Two Sample t-Test
Pooled t-Test
Comparing Two Means
The statistic of interest is the difference in the observed means of the offer and no offer groups: $y_0 - y_n$.
What we'd really like to know is the difference of the means in the population at large: $\mu_0 - \mu_n$.
Now the population model parameter of interest is the difference between the means.
Comparing Two Means (cont.)
As long as the two groups are independent, we find the standard deviation of the difference between the two sample means by adding their variances and then taking the square root:
Comparing Two Means (cont.)
Usually we don't know the true standard deviations of the two groups, $\sigma_1$ and $\sgma_2$, so we substitute the estimates, $s_1$ and $s_2$, and find a standard error:
We'll use the standard error to see how big the difference really is.
A Sampling Distribution for the Difference Between Two Means
When the conditions are met, the standardized sample difference between the means of two independent groups,
can be modeled by a Student's $t$-model with a number of degrees found with a special formula.
Approximation: df $= n_1 + n_2 - 2$
The Two-Sample t-Test
Test hypothesis:
where the hypothesized difference $\Delta_0$ is almost always 0.
When the null hypothesis is true, the statistic can be closely modeled by a Student's t-model with a number of degrees of freedom given by a special formula.
Assumptions and Conditions
Independence Assumption: The data in each group must be drawn independently and at random.
Randomization Condition: Without randomization of some sort, there are no sampling distribution models and no inference.
10% Condition: We usually don't check this condition for differences of means. We needn’t worry about it at all for randomized experiments.
Normal Population Assumption: underlying populations are each Normally distributed.
Nearly Normal Condition: We must check this for both groups; a violation by either one violates the condition.
Independent Groups Assumption: the two groups we are comparing must be independent of each other.
CI for the Difference Between Two Means
When the conditions are met, we are ready to find a two-sample $t$-interval for the difference between means of two independent groups, $\mu_1 - \mu_2$. The confidence interval is:
Example 1
A market analyst wants to know if a new website is showing increased page views per visit. Given statistics below, find the estimated mean difference in page visits between the two websites.
Website 1 | Website 2 |
---|---|
$n_1 = 80$ | $n_1 = 95$ |
$\hat{y}_1 = 7.7$ pages | $\hat{y}_2 = 7.3$ pages |
$s_1 = 4.6$ pages | $s_1 = 4.3$ pages |
Example 1 (cont.)
where df = 163.59
Fail to reject the null hypothesis. Since 0 is in the interval, it is a plausible value for the true difference in means.
Example 1 (cont.)
where df = 163.59
Fail to reject the null hypothesis. There is insufficient evidence to conclude a statistically significant mean difference in the number of webpage visits.
The Pooled t-Test
If we assume that the variances of the groups are equal (at least when the null hypothesis is true), then we can save some degrees of freedom.
To do that, we have to pool the two variances that we estimate from the groups into one common, or pooled, estimate of the variance:
The Pooled t-Test (cont.)
Now we substitute the common pooled variance for each of the two variances in the standard error formula, making the pooled standard error formula simpler:
The formula for degrees of freedom for the Student's $t$-model is simpler, too.
The Pooled t-Test
For pooled $t$-methods, the Equal Variance Assumption need to be satisfied that the variances of the two populations from which the samples have been drawn are equal. That is, $\sigma_1 = \sigma_2$.
where the hypothesized difference $\Delta_0$ is almost always 0, using the statistic
$t = \frac{(\bar{y}_1-\bar{y}_2) - \Delta_0}{SE_{pooled}(\bar{y}_1-\bar{y}_2)}$
The Pooled t-Test Confidence Interval
The corresponding pooled-$t$ confidence interval is
where the critical value $t^*$ depends on the confidence level and is found with $(n_1-1) + (n_2-1)$ degrees of freedom.