Hypothesis Testing

Hypothesis Testing


Example 1

Is the Dow Jones just as likely to move higher as it is to move lower on any given day?

dow

Out of the 1112 trading days in that period, the average increased on 573 days (sample proportion = 0.5153 or 51.53%).

It far enough from 50% to cast doubt on the assumption of equally likely up or down movement?


Hypotheses

The null hypothesis, $H_0$, specifies a population model parameter and proposes a value for that parameter.

The alternative hypothesis, $H_A$, contains the values of the parameter that we consider plausible if we reject the null hypothesis.


Alternative Hypotheses

In a two-sided alternative we are equally interested in deviations on either side of the null hypothesis value

\[H_0: p = p_0, \; H_A: p \neq p_0\]

An alternative hypothesis that focuses on deviations from the null hypothesis value in only one direction is called a one-sided alternative.

\[H_0: p = p_0, \; H_A: p < p_0 \;\text{or} \;p > p_0\]

P-Values

The P-value is the probability of seeing the observed data (or something even less likely) given the null hypothesis.


Example 2

Which of the following are true?


Example 2 (cont.)

Which of the following are true?


Alpha Levels and Significance

We can define a "rare event" arbitrarily by setting a threshold for our P-value, alpha level, $\alpha$.

We call such results statistically significant.


Example 1 (cont.)

Find the standard deviation of the sample proportion of days on which the DJIA increased.


Example 1 (cont.)

If we assume that the DJIA increases or decreases with equal likelihood, we'll need to center our Normal sampling model at a mean of 0.5.

Then, the standard deviation of the sampling

\[SD(\hat{p}) = \sqrt \frac{pq}{n} = \sqrt \frac{(0.5)(1-0.5)}{1112} = 0.015\]

Example 1 (cont.)

dow2

How likely is it that the observed value would be 0.5153 - 0.5 = 0.0153 units away from the mean?

This is the probability of observing more than 51.53% up days (or more than 51.53% down days) if the null model were true.


The Reasoning of Hypothesis Testing

Four sections: hypothesis, model, mechanics, and conclusion.

Hypotheses

Testing hypothesis is based on a test statistic $T$, a quantity computed from the data that has some known.


The Reasoning of Hypothesis Testing (cont.)

Model


The Reasoning of Hypothesis Testing (cont.)

Your model step should end with a statement such as:

Each test has a name that you should include in your report. The test about proportions is called a one-proportion z-test.


One-proportion Z-test

The conditions for the one-proportion z-test are the same as for the one-proportion z-interval. We test the hypothesis $H_0:p=p_0$ using the statistic $z = \frac{\hat{p} - p_0}{SD(\hat{p})}$


The Reasoning of Hypothesis Testing (cont.)

Mechanics


The Reasoning of Hypothesis Testing (cont.)

Conclusions and Decisions


Example 3

A survey of 100 CEOs finds that 60 think the economy will improve next year. Is there evidence that the rate is higher among all CEOs than the 55% reported by the public at large?


Example 3 (cont.)

A survey of 100 CEOs finds that 60 think the economy will improve next year. Is there evidence that the rate is higher among all CEOs than the 55% reported by the public at large?


Two Types of Errors

We can make mistakes in two ways:

  1. (False Hypothesis) The null hypothesis is true, but we mistakenly reject it.

  2. (False Negative) The null hypothesis is false, but we fail to reject it.

These two types of errors are known as Type I and Type II errors respectively.


Two Types of Errors (cont.)

errors

When you choose level $\alpha$, you're setting the probability of a Type I error to $\alpha$.


Significance Level & Power

Probability of a type I error is the significance level of a test,

\[\alpha = P\{ \text{reject } H_0 \;| \; H_0 \text{ is true} \}\]

A test's ability to detect a false hypothesis is called the power of the test.

\[\beta = P\{ \text{reject } H_0 \;| \; H_A \text{ is true} \}\]

Critical Values

A critical value, $z^\ast$, corresponds to a selected confidence level.

Here are the traditional $z^\ast$ critical values from the Normal model:

$\alpha$1-sided2-sided
0.051.6451.96
0.012.332.576
0.0013.093.29

z-tests


P-values for Z-tests

P-values


One-Sample T-Test

For testing a hypothesis about a mean, the test is based on the Student's T-distribution.

Is there evidence from a sample that the mean is really different from some hypothesized value calls for a one-sample t-test for the mean.


One-sample T-test for the mean

The conditions for the one-sample t-test for the mean are the same as for the one-sample t-interval. We test the hypothesis $H_0: \mu = \mu_0$ using the statistic

\[t_{n-1} = \frac{\bar{y}-\mu_0}{SE(\bar{y})}\]

where the standard error of is $SE(\bar{y}) = s/\sqrt n$


Example 4

A new manager of a small convenience store randomly samples 20 purchases from yesterday's sales. If the mean was 45.26 and the standard deviation was 20.67, is there evidence that the mean purchase amount is at least 40?


Example 4 (cont.)

\[H_0: \mu = 40, \; H_A: \mu \geq 40\]
\[t = \frac{45.26 - 40}{20.67/\sqrt{20}} = 1.138\]
\[\text{P-value} = 0.1346\]

Confidence Intervals and Hypothesis Tests


Example 5

Recall the new manager of a small convenience store who randomly sampled 20 purchases from yesterday's sales.


Example 5 (cont.)

Given a 95% confidence interval (35.586, 54.934), is there evidence that the mean purchase amount is different from 40?

Is the confidence interval conclusion consistent with the (two-sided) P-value = 0.2692?


Comparing Two Means


Comparing Two Means (cont.)

As long as the two groups are independent, we find the standard deviation of the difference between the two sample means by adding their variances and then taking the square root:

\[SD(\hat{y}_1 - \hat{y}_2) = \sqrt{Var(\hat{y}_1) + Var(\hat{y}_2)}\]
\[=\sqrt{\left(\frac{\sigma_1}{\sqrt n_1}\right)^2 + \left(\frac{\sigma_2}{\sqrt n_2}\right)^2}\]
\[=\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}\]

Comparing Two Means (cont.)

Usually we don't know the true standard deviations of the two groups, $\sigma_1$ and $\sigma_2$, so we substitute the estimates, $s_1$ and $s_2$, and find a standard error:

\[SE(\hat{y}_1 - \hat{y}_2) = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\]

We'll use the standard error to see how big the difference really is.


A Sampling Distribution for the Difference Between Two Means

When the conditions are met, the standardized sample difference between the means of two independent groups,

\[t = \frac{ (\hat{y}_1 - \hat{y}_2) - (\mu_1 - \mu_2)} {SE(\hat{y}_1 - \hat{y}_2)}\]

can be modeled by a Student's $t$-model with a number of degrees found with a special formula.

\[df = SE(\hat{y}_1 - \hat{y}_2)^4/ \left( \frac{s_1^4}{n^2_1(n_1-1)} + \frac{s_2^4}{n^2_2(n_2-1)} \right)\]

The Two-Sample T-Test

Test hypothesis:

\[H_0 : \mu_1 - \mu_2 = \Delta_0\]

where the hypothesized difference $\Delta_0$ is almost always 0.

\[t = \frac{(\bar{y}_1-\bar{y}_2) - \Delta_0}{SE(\bar{y}_1 - \bar{y}_2)}\]
\[SE(\bar{y}_1 - \bar{y}_2) = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }\]

When the null hypothesis is true, the statistic can be closely modeled by a Student's t-model with a number of degrees of freedom given by a special formula.


Assumptions and Conditions


CI for the Difference Between Two Means

When the conditions are met, we are ready to find a two-sample $t$-interval for the difference between means of two independent groups, $\mu_1 - \mu_2$. The confidence interval is:

\[(\bar{y}_1 - \bar{y}_2) \pm t^*_{df} \times SE(\bar{y}_1 - \bar{y}_2)\]

Example 6

A market analyst wants to know if a new website is showing increased page views per visit. Given statistics below, find the estimated mean difference in page visits between the two websites.

Website 1Website 2
$n_1 = 80$$n_1 = 95$
$\hat{y}_1 = 7.7$ pages$\hat{y}_2 = 7.3$ pages
$s_1 = 4.6$ pages$s_1 = 4.3$ pages

Example 6 (cont.)

\[(\bar{y}_1 - \bar{y}_2) \pm t * \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }\]

where df = 163.59

\[= (7.7 - 7.3) \pm (1.9676) \sqrt{\frac{4.6^2}{80} + \frac{4.3^2}{95}}\]
\[= 0.4 \pm 1.338 = (-0.938, 1.738)\]

Fail to reject the null hypothesis. Since 0 is in the interval, it is a plausible value for the true difference in means.


Example 6 (cont.)

\[t = \frac{(\bar{y}_1-\bar{y}_2)}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} }}\]

where df = 163.59

\[=\frac{(7.7 - 7.3)}{\sqrt{\frac{4.6^2}{80} + \frac{4.3^2}{95}}} = \frac{0.4}{0.68}=0.588\]
\[P(t>0.588) = 0.2786\]

Fail to reject the null hypothesis. There is insufficient evidence to conclude a statistically significant mean difference in the number of webpage visits.


The Pooled T-Test

If we assume that the variances of the groups are equal (at least when the null hypothesis is true), then we can save some degrees of freedom.

To do that, we have to pool the two variances that we estimate from the groups into one common, or pooled, estimate of the variance:

\[s_{pooled} = \frac{(n_1-1)s^2_1 + (n_2-1)s^2_2}{(n_1-1) + (n_2-1)}\]

The Pooled T-Test (cont.)

Now we substitute the common pooled variance for each of the two variances in the standard error formula, making the pooled standard error formula simpler:

\[SE_{pooled}(\bar{y}_1-\bar{y}_2) = \sqrt{\frac{s^2_{pooled}}{n_1} + \frac{s^2_{pooled}}{n_2}} = s_{pooled}\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}\]

The formula for degrees of freedom for the Student's $t$-model is simpler, too.

\[\text{df} = (n_1-1) + (n_2-1)\]

The Pooled T-Test

For pooled $t$-methods, the Equal Variance Assumption need to be satisfied that the variances of the two populations from which the samples have been drawn are equal. That is, $\sigma_1 = \sigma_2$.

\[H_0: \mu_1 - \mu_2 = \Delta_0\]

where the hypothesized difference $\Delta_0$ is almost always 0, using the statistic

\[t = \frac{(\bar{y}_1-\bar{y}_2) - \Delta_0}{SE_{pooled}(\bar{y}_1-\bar{y}_2)}\]

The Pooled T-Test Confidence Interval

The corresponding pooled-$t$ confidence interval is

\[(\bar{y}_1-\bar{y}_2) \pm t^*_{\text{df}} \times SE_{pooled}(\bar{y}_1-\bar{y}_2)\]

where the critical value $t^*$ depends on the confidence level and is found with $(n_1-1) + (n_2-1)$ degrees of freedom.


Summary of T-tests

t-tests


Inference about Variances

We need special treatment for confidence intervals and tests for the variance, because


Chi-square Distribution

When observations $X_1, \ldots, X_n$ are independent and Normal with $Var(X_i) = \sigma^2$, the distribution of

\[\frac{(n-1)s^2}{\sigma^2} = \sum^n_{i=1} \left( \frac{X_i - \bar{X}}{\sigma} \right)^2\]

is Chi-square with $(n-1)$ degrees of freedom.

Chi-square distribution, or $\chi^2$, is a continuous distribution with density

\[f(x) = \frac{1}{2^{\nu/2}\Gamma(\nu/2)} x^{\nu/2-1}e^{-x/2}, x > 0\]

or Chi-square($\nu$) = Gamma($\nu$/2, 1/2) with E(X)=$\nu$ and Var(X)=$2\nu$.


Chi-square Densities

chi-square

Chi-square densities with $\nu$ = 1, 5, 10, and 30 degrees of freedom. Each distribution is right-skewed. For large $\nu$, it is approximately Normal.


CI for the Population Variance

Let us construct a $(1-\alpha)100\%$ confidence interval for the population variance $\sigma^2$, based on a sample of size $n$.

\[P \left\{ \chi^2_{\alpha/2} \leq \frac{(n-1)s^2}{\sigma^2} \leq \chi^2_{1-\alpha/2} \right\} = 1 - \alpha\]

Then confidence interval for the variance is

\[\left[ \frac{(n-1)s^2}{\chi^2_{\alpha/2}}, \frac{(n-1)s^2}{\chi^2_{1-\alpha/2}} \right]\]

Example 7

A sample from the measurement device of 6 measurements:

2.5, 7.4, 8.0, 4.5, 7.4, 9.2

with the known standard deviation $\sigma$ = 2.2. Using the data only, construct a 90% confidence interval for the standard deviation.


Example 7 (cont.)

\[\bar{X} = 6.5\]
\[s^2 = 6.232\]
\[(n-1)s^2 = 31.16\]
\[\chi^2_{1-\alpha/2}=1.15\]
\[\chi^2_{\alpha/2}=11.1\]
\[CI=[2.82, 27.14]\]

Testing Variance

Testing the population variance with $\chi^2$-tests

chi2-tests


Example 8

Refer to example 7. The 90% confidence interval constructed there contains the suggested value of $\sigma$ = 2.2. Then, by duality between confidence intervals and tests, there should be no evidence against this value of $\sigma$. Measure the amount of evidence against it by computing the suitable P-value.


Example 8 (cont.)

The hypothesis $H_0: \sigma = 2.2$ is tested against $H_A: \sigma \neq 2.2$.

Compute the test statistic from the data

\[\chi^2_{obs} = \frac{(5)(6.232)}{2.2^2} = 6.438\]

with $\nu = n - 1 = 5$ degrees of freedom,

\[\chi^2_{0.80} = 2.43 < \chi^2_{obs} < \chi^2_{0.20} = 7.29\]

Therefore,

\[P = 2 \min \{P \{\chi^2 \geq \chi^2_{obs} \}, P \{\chi^2 \leq \chi^2_{obs} \} \}\]
\[= 2 \min \{0.734, 0.266 \} = 0.531 > 0.4\]

The evidence against $\sigma = 2.2$ is very weak; at all typical significance levels $H_0$ should be accepted.