Confidence Intervals

Statistical Inference


Parameter estimation


Method of Moments

The $k$-th population moment is defined as

\[\mu_k = E(X^k)\]

The $k$-th sample moment is defined as

\[m_k = \frac{1}{n}\sum_{i=1}^n X^k_i\]

estimates $\mu_k$ from a sample $(X_1, \ldots, X_n)$.

The first sample moment is the sample mean $\bar{X}$.


Central Moments

For $k \geq 2$, the $k$-th population central moment is defined as

\[\mu'_k = E(X - \mu_1)^k\]

The $k$-th sample central moment is defined as

\[m_k = \frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^k\]

estimates $\mu_k$ from a sample $(X_1, \ldots, X_n)$.


Estimation

To estimate $k$ parameters, equate the first $k$ population and sample moments,

\[\begin{cases} \mu_1 = m1 \\ \cdots \cdots \cdots \\ \mu_k = m_k \end{cases}\]

The method of moments estimator is the solution of this system of equations.


Example 1

To estimate parameter $\lambda$ of Poisson($\lambda$) distribution.

\[\mu_1 = E(X) = \lambda\]

There is only one unknown parameter, hence we write one equation,

\[\mu_1 = \lambda = m_1 = \bar{X}\]

Thus,

\[\hat{\lambda} = \bar{X}\]

Method of Maximum Likelihood

Maximum likelihood estimator is the parameter value that maximizes the likelihood of the observed sample.


MoML: Discrete Distribution

\[P\{\mathbf{X} = (X_1, \ldots, X_n) \} = P(\mathbf{X}) = P(X_1, \ldots, X_n) = \prod_{i=1}^n P(X_i)\]

To maximize this likelihood, we consider the critical points by taking derivatives with respect to all unknown parameters and equating them to 0, $\frac{\partial}{\partial \theta}P(\mathbf{X}) = 0.$

Differentiating the sum

\[\ln \prod_{i=1}^n P(X_i) = \sum_{i=1}^n \ln P(X_i)\]

is easier than differentiating the product.


Example 2

The pmf of Poisson distribution is

\[P(x) = e^{-\lambda}\frac{\lambda^x}{x!}\]
\[\ln P(x) = -\lambda + x\ln \lambda - \ln(x!)\]

Thus, we need to maximize

\[\ln P(X) = \sum_{i=1}^n (-\lambda + X_i\ln \lambda) + C = -n\lambda + \ln \lambda \sum_{i=1}^n X_i + C\]
\[\frac{\partial}{\partial \lambda} \ln P(X) = -n + \frac{1}{\lambda} \sum_{i=1}^n X_i = 0\]
\[\hat{\lambda} = \frac{1}{n} \sum_{i=1}^n X_i = \bar{X}\]

MoML: Continuous Distribution

In the continuous case, the method of maximum likelihood will maximize the probability of observing “almost” the same number.

The probability of observing a value close to $x$ is proportional to the density $f(x)$.


Example 3

The pdf of Exponential distribution is

\[f(x) = \lambda e^{-\lambda x}\]

So,

\[\ln f(\mathbf{X}) = \sum_{i=1}^n \ln (\lambda e^{-\lambda X_i}) = \sum_{i=1}^n (\ln \lambda -\lambda X_i) = n \ln \lambda - \lambda \sum_{i=1}^n X_i\]
\[\frac{\partial}{\partial \lambda} \ln f(\mathbf{X}) = \frac{n}{\lambda} - \sum_{i=1}^n X_i = 0\]
\[\hat{\lambda} = \frac{n}{\sum_{i=1}^n X_i} = \frac{1}{\bar{X}}\]

This is the only critical point. The estimator $\hat{\lambda}$ is just the reciprocal of the $\bar{X}$.


Estimation of Standard Errors

In Examples 1 and 2, we found the method of moments and maximum likelihood estimators of the Poisson parameter $\lambda$, $\hat{\lambda} = \bar{X}$

Let us now estimate the $\hat{\lambda}$.

\[\sigma(\hat{\lambda}) = \sigma(\bar{X}) = \sigma/\sqrt n = \sqrt{\lambda/n}\]
\[s_1(\hat{\lambda}) = \sqrt \frac{\bar{X}}{n} = \frac{\sqrt {\sum X_i}}{n}\]

Confidence Intervals

An interval $[a, b]$ is a $(1-\alpha)100\%$ confidence interval for the parameter $\theta$ if it contains the parameter with probability $(1 -\alpha)$,

\[P \{a \leq \theta \leq b\} = 1 -\alpha\]

Coverage probability $(1 -\alpha)$ is also called a confidence level.


Construction of CI

a) Assume there is an unbiased estimator $\hat{\theta}$ that has a Normal distribution.

b) Standardize it,

\[Z =\frac{\hat{\theta} - E(\hat{\theta})}{\sigma(\hat{\theta})} = \frac{\hat{\theta} - \theta}{\sigma(\hat{\theta})}\]

c) $Z$ falls between the Standard Normal quantiles $q_{\alpha/2}$ and $q_{1-\alpha/2}$ with probability $(1 -\alpha)$, denoted by

\[-z_{\alpha/2} = q_{\alpha/2}\]
\[z_{\alpha/2} = q_{1-\alpha/2}\]

Construction of CI (cont.)

d) Then,

\[P \left\{ -z_{\alpha/2} \leq \frac{\hat{\theta} - \theta}{\sigma(\hat{\theta})} \leq z_{\alpha/2} \right\} = 1 -\alpha\]
\[P \left\{\hat{\theta} -z_{\alpha/2} \sigma(\hat{\theta}) \leq \theta \leq \hat{\theta} + z_{\alpha/2} \sigma(\hat{\theta}) \right\} = 1 -\alpha\]

e) Substituting,

\[a = \hat{\theta} -z_{\alpha/2} \sigma(\hat{\theta})\]
\[b = \hat{\theta} +z_{\alpha/2} \sigma(\hat{\theta})\]

such that

\[P \{a \leq \theta \leq b\} = 1 -\alpha\]

Margin of Error: Certainty vs. Precision

The extent of that interval on either side of is called the margin of error (ME). The general confidence interval can now be expressed in terms of the ME.

\[estimate \pm ME\]

Example 4

In March 2010, a Gallop Poll found that 1012 out of 2976 respondents thought economic conditions were getting better – a sample proportion of

\[\hat{p} = 1012/2976 = 34.0\%\]

We’d like use this sample proportion to say something about what proportion, $p$, of the entire population thinks the economic conditions are getting better.


Confidence Interval for Proportions

We know that our sampling distribution model is centered at the true proportion

\[Var(\hat{p}) = \frac{pq}{n}, q = 1-p\]

So, following CLT, we can aproximate the sampling distribution with Normal, and use $\hat{p}$ to calculate standard error, SE.

\[SE(\hat{p}) = \sigma(\hat{p}) = \sqrt \frac{\hat{p}\hat{q}}{n} = \sqrt \frac{(0.34)(1-0.34)}{2976} = 0.009\]

Example 4 (cont.)

Because the distribution is Normal, we expect that about 95% of all samples of 2976 U.S. adults would have had sample proportions within two SEs of $p$, 0.0018.


Example 4 (cont.)

An appropriate interpretation of our confidence interval would be,

The confidence interval calculated and interpreted here is an example of a one-proportion z-interval.


Critical Values

For any confidence level the number of SEs we must stretch out on either side of $\hat{\theta}$ is called the critical value.

Because a critical value is based on the Normal model, we denote it $z^*$.

CI$z^*$
90%1.645
95%1.960
99%2.576

Example 5

In the spring of 2009 workers at Sony France protesting layoffs, took the boss hostage, "bossnapping". What did other French adults think of this practice? Where they sympathetic? Understanding? Approving?

A polls taken in April 2009 found:

The poll was based on a random representative sample of 1010 adults.


Example 5 (cont.)

Conditions:

\[n\hat{p} = (1010)*(0.63) = 636 \geq 10\]
\[n\hat{q} = (1010)*(0.37) = 374 \geq 10\]

The conditions are satisfied so a one-proportion z-interval using the Normal model is appropriate.


Example 5 (cont.)

What can we conclude about the proportion of all French adults who sympathize?

\[n = 1010, \hat{p} = 0.63\]
\[\sigma(\hat{p}) = \sqrt \frac{(0.63)(0.37)}{1010} = 0.015\]

For a 95% CI, $z^* = 1.96$, so

\[ME = z^* \sigma(\hat{p}) = 1.96(0.015) = 0.029\]

or

\[0.63 \pm 0.029\]

Based on the survey we can be 95% confident that between 60.1% and 65.9% of all French adults were sympathetic.


Choosing the Sample Size

To get a narrower confidence interval without giving up confidence, we must choose a larger sample.

\[ME = z^* \sqrt \frac{\hat{p}\hat{q}}{n}\]

Thus,

\[n \geq (z^*)^2 \frac{\hat{p}(1-\hat{p})}{ME^2}\]

or

\[n \geq 0.25\left(\frac{z^*}{ME}\right)^2\]

Example 6

Suppose a company wants to offer a new service and wants to estimate, to within 3%, the proportion of customers who are likely to purchase this new service with 95% confidence. How large a sample do they need?

We proceed by guessing the worst case scenario for $\hat{p}$. We guess $\hat{p}$ is 0.50 because this makes the SD (and therefore n) the largest.

\[n = (1.96)^2 \frac{(0.5)(0.5)}{0.03^2} = 1067.1\]

We can conclude that the company will need at least 1068 respondents to keep the margin of error as small as 3% with confidence level 95%.


Confidence Intervals for Means


CI for the Population Mean

Let us construct a confidence interval for the population mean $\theta = E(X) = \mu $

It's estimator is $\hat{\theta} = \bar{X}$,

So for the population mean and known $\sigma$, $\sigma(\hat{\theta}) = \sigma/ \sqrt{n}$ $CI = \bar{X} \pm z_{\alpha/2}\frac{\sigma}{\sqrt n}$


CI for for Two Means Difference

To construct a confidence interval for the difference between population means, $\theta = \mu_X - \mu_Y$.

  1. Propose an estimator of $\theta$, $\hat{\theta} = \bar{X} - \bar{Y}$

  2. Check that $\hat{\theta}$ is unbiased.

  3. Check that $\hat{\theta}$ is a Normal or approximately Normal distribution.

  4. Find the standard error of $\hat{\theta}$ (using independence)

\[\sigma(\hat{\theta}) = \sqrt{Var(\bar{X} - \bar{Y})} = \sqrt{Var(\bar{X}) + Var(\bar{Y})} = \sqrt{\frac{\sigma_X}{n} + \frac{\sigma_Y}{m}}\]
\[CI = \bar{X} - \bar{Y} \pm z_{\alpha/2}\sqrt{\frac{\sigma_X}{n} + \frac{\sigma_Y}{m}}\]

The Sampling Distribution for the Mean

\[\sigma(\bar{\theta}) = \frac{s}{\sqrt n}\]
\[\bar{\theta} \pm ME\]

where the $ME$ was equal to a critical value, $z^*$, times standard error $\sigma(\bar{\theta})$.


Gosset's t

William S. Gosset discovered above when he used the standard error $\frac{s}{\sqrt n}$ the shape of the curve was no longer Normal.

New model was called the Student's t, and it is always bell-shaped, but the details change with the sample sizes.

The Student's t-models form a family of related distributions depending on a parameter known as degrees of freedom.

t-dist


Example 7

Data from a survey of 25 randomly selected customers found a mean age of 31.84 years and the standard deviation was 9.84 years.

What is the standard error of the mean?

How would the standard error change if the sample size had been 100 instead of 25? (Assume that $s$ = 9.84 years.)


Practical sampling distribution model for means

When unknown standard error of $\bar{\theta}$ is replaced by its estimator

\[s(\bar{\theta}) = \frac{s}{\sqrt n}\]

The standardized sample mean,

\[t = \frac{\bar{\theta} - \theta}{s(\bar{\theta})}\]

no longer has a Normal distribution!

We need to use a Student's $t$-distribution with $n-1$ degrees of freedom.


One-sample t-interval

When the assumptions and conditions are met, the confidence interval for the population mean, $\mu$ is:

\[\bar{\theta} \pm t^*_{n-1} \frac{s}{\sqrt n}\]

The critical value $t^*_{n-1}$ depends on the particular confidence level, $\alpha$, that you specify and on the number of degrees of freedom, $n-1$, which we get from the sample size.


Finding t-values

For example, suppose we’ve performed a one-sample t-test with 19 df and a critical value of 1.639, and we want the upper tail P-value.

t-dist

From the table, we see that 1.639 falls between 1.328 and 1.729. All we can say is that the P-value lies between P-values of these two critical values, so 0.05 < P < 0.10.


Example 8

Data from a survey of 25 randomly selected customers found a mean age of 31.84 years and the standard deviation was 9.84 years.

Construct a 95% confidence interval for the mean. Interpret the interval.


Example 8 (cont.)

Construct a 95% confidence interval for the mean.

\[\bar{\theta} \pm t^*_{n-1} \times \sigma(\bar{\theta}) = 31.84 \pm (2.064)(1.968) =\]
\[31.84 \pm 4.062 = (27.78, 35.90)\]

Interpret the interval.


Assumptions and Conditions


Normal Population Assumption


Example 9

In 25 randomly selected customers survey found a mean age of 31.84 years and the standard deviation was 9.84 years. A 95% confidence interval for the mean is (27.78, 25.90).

l6-ex5.png