Statistical Inference

Parameter estimation

Method of moments
Method of maximum likelihood

Method of Moments

The $k$-th population moment is defined as

\[\mu_k = E(X^k)\]

The $k$-th sample moment is defined as

\[m_k = \frac{1}{n}\sum_{i=1}^n X^k_i\]

estimates $\mu_k$ from a sample $(X_1, \ldots, X_n)$.

The first sample moment is the sample mean $\bar{X}$.

Central Moments

For $k \geq 2$, the $k$-th population central moment is defined as

\[\mu'_k = E(X - \mu_1)^k\]

The $k$-th sample central moment is defined as

\[m_k = \frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^k\]

estimates $\mu_k$ from a sample $(X_1, \ldots, X_n)$.

Estimation

To estimate $k$ parameters, equate the first $k$ population and sample moments,

\[\begin{cases} \mu_1 = m1 \\ \cdots \cdots \cdots \\ \mu_k = m_k \end{cases}\]

The method of moments estimator is the solution of this system of equations.

Example 1

To estimate parameter $\lambda$ of Poisson($\lambda$) distribution.

\[\mu_1 = E(X) = \lambda\]

There is only one unknown parameter, hence we write one equation,

\[\mu_1 = \lambda = m_1 = \bar{X}\]

Thus,

\[\hat{\lambda} = \bar{X}\]

Method of Maximum Likelihood

Maximum likelihood estimator is the parameter value that maximizes the likelihood of the observed sample.

For a discrete distribution, we maximize the joint pmf of data $P(X_1, \ldots, X_n)$.
For a continuous distribution, we maximize the joint density $f(X_1, \ldots, X_n)$.

MoML: Discrete Distribution

\[P\{\mathbf{X} = (X_1, \ldots, X_n) \} = P(\mathbf{X}) = P(X_1, \ldots, X_n) = \prod_{i=1}^n P(X_i)\]

To maximize this likelihood, we consider the critical points by taking derivatives with respect to all unknown parameters and equating them to 0, $\frac{\partial}{\partial \theta}P(\mathbf{X}) = 0.$

Differentiating the sum

\[\ln \prod_{i=1}^n P(X_i) = \sum_{i=1}^n \ln P(X_i)\]

is easier than differentiating the product.

Besides, logarithm is an increasing function, so the likelihood $P(X)$ and the log-likelihood $\ln P(X)$ are maximized by exactly the same parameters.

Example 2

The pmf of Poisson distribution is

\[P(x) = e^{-\lambda}\frac{\lambda^x}{x!}\]

\[\ln P(x) = -\lambda + x\ln \lambda - \ln(x!)\]

Thus, we need to maximize

\[\ln P(X) = \sum_{i=1}^n (-\lambda + X_i\ln \lambda) + C = -n\lambda + \ln \lambda \sum_{i=1}^n X_i + C\]

\[\frac{\partial}{\partial \lambda} \ln P(X) = -n + \frac{1}{\lambda} \sum_{i=1}^n X_i = 0\]

\[\hat{\lambda} = \frac{1}{n} \sum_{i=1}^n X_i = \bar{X}\]

MoML: Continuous Distribution

In the continuous case, the method of maximum likelihood will maximize the probability of observing âalmostâ the same number.

The probability to observe exactly $X = x$ is 0.

The probability of observing a value close to $x$ is proportional to the density $f(x)$.

Example 3

The pdf of Exponential distribution is

\[f(x) = \lambda e^{-\lambda x}\]

So,

\[\ln f(\mathbf{X}) = \sum_{i=1}^n \ln (\lambda e^{-\lambda X_i}) = \sum_{i=1}^n (\ln \lambda -\lambda X_i) = n \ln \lambda - \lambda \sum_{i=1}^n X_i\]

\[\frac{\partial}{\partial \lambda} \ln f(\mathbf{X}) = \frac{n}{\lambda} - \sum_{i=1}^n X_i = 0\]

\[\hat{\lambda} = \frac{n}{\sum_{i=1}^n X_i} = \frac{1}{\bar{X}}\]

This is the only critical point. The estimator $\hat{\lambda}$ is just the reciprocal of the $\bar{X}$.

Estimation of Standard Errors

In Examples 1 and 2, we found the method of moments and maximum likelihood estimators of the Poisson parameter $\lambda$, $\hat{\lambda} = \bar{X}$

Let us now estimate the $\hat{\lambda}$.

For Poisson: $\sigma = \sqrt \lambda$, so

\[\sigma(\hat{\lambda}) = \sigma(\bar{X}) = \sigma/\sqrt n = \sqrt{\lambda/n}\]

Estimating $\lambda$ by $\bar{X}$

\[s_1(\hat{\lambda}) = \sqrt \frac{\bar{X}}{n} = \frac{\sqrt {\sum X_i}}{n}\]

Confidence Intervals

An interval $[a, b]$ is a $(1-\alpha)100\%$ confidence interval for the parameter $\theta$ if it contains the parameter with probability $(1 -\alpha)$,

\[P \{a \leq \theta \leq b\} = 1 -\alpha\]

Coverage probability $(1 -\alpha)$ is also called a confidence level.

Construction of CI

a) Assume there is an unbiased estimator $\hat{\theta}$ that has a Normal distribution.

b) Standardize it,

\[Z =\frac{\hat{\theta} - E(\hat{\theta})}{\sigma(\hat{\theta})} = \frac{\hat{\theta} - \theta}{\sigma(\hat{\theta})}\]

c) $Z$ falls between the Standard Normal quantiles $q_{\alpha/2}$ and $q_{1-\alpha/2}$ with probability $(1 -\alpha)$, denoted by

\[-z_{\alpha/2} = q_{\alpha/2}\]

\[z_{\alpha/2} = q_{1-\alpha/2}\]

Construction of CI (cont.)

d) Then,

\[P \left\{ -z_{\alpha/2} \leq \frac{\hat{\theta} - \theta}{\sigma(\hat{\theta})} \leq z_{\alpha/2} \right\} = 1 -\alpha\]

\[P \left\{\hat{\theta} -z_{\alpha/2} \sigma(\hat{\theta}) \leq \theta \leq \hat{\theta} + z_{\alpha/2} \sigma(\hat{\theta}) \right\} = 1 -\alpha\]

e) Substituting,

\[a = \hat{\theta} -z_{\alpha/2} \sigma(\hat{\theta})\]

\[b = \hat{\theta} +z_{\alpha/2} \sigma(\hat{\theta})\]

such that

\[P \{a \leq \theta \leq b\} = 1 -\alpha\]

Margin of Error: Certainty vs. Precision

The extent of that interval on either side of is called the margin of error (ME). The general confidence interval can now be expressed in terms of the ME.

\[estimate \pm ME\]

The more confident we want to be, the larger the margin of error must be.
Every confidence interval is a balance between certainty and precision.

Example 4

In March 2010, a Gallop Poll found that 1012 out of 2976 respondents thought economic conditions were getting better – a sample proportion of

\[\hat{p} = 1012/2976 = 34.0\%\]

We’d like use this sample proportion to say something about what proportion, $p$, of the entire population thinks the economic conditions are getting better.

Confidence Interval for Proportions

We know that our sampling distribution model is centered at the true proportion

\[Var(\hat{p}) = \frac{pq}{n}, q = 1-p\]

So, following CLT, we can aproximate the sampling distribution with Normal, and use $\hat{p}$ to calculate standard error, SE.

\[SE(\hat{p}) = \sigma(\hat{p}) = \sqrt \frac{\hat{p}\hat{q}}{n} = \sqrt \frac{(0.34)(1-0.34)}{2976} = 0.009\]

Example 4 (cont.)

Because the distribution is Normal, we expect that about 95% of all samples of 2976 U.S. adults would have had sample proportions within two SEs of $p$, 0.0018.

"It is probably true that 34.0% of all U.S. adults thought the economy was improving."
- We can be pretty certain that whatever the true proportion is, it’s probably not exactly 34.0%.
"We don't know the exact proportion of U.S. adults who thought the economy was improving but the interval from 32.2% to 35.8% probably contains the true proportion."
- This is close to correct, but what is meant by probably?

Example 4 (cont.)

An appropriate interpretation of our confidence interval would be,

"We are 95% confident that between 32.2% to 35.8% of U.S. adults thought the economy was improving."

The confidence interval calculated and interpreted here is an example of a one-proportion z-interval.

Critical Values

For any confidence level the number of SEs we must stretch out on either side of $\hat{\theta}$ is called the critical value.

Because a critical value is based on the Normal model, we denote it $z^*$.

CI	$z^*$
90%	1.645
95%	1.960
99%	2.576

In the spring of 2009 workers at Sony France protesting layoffs, took the boss hostage, "bossnapping". What did other French adults think of this practice? Where they sympathetic? Understanding? Approving?

A polls taken in April 2009 found:

30% “approving”,
63% were “understanding” or “sympathetic” of the action,
Only 7% condemned the practice of "bossnapping"

The poll was based on a random representative sample of 1010 adults.

Example 5 (cont.)

Conditions:

Randomization Condition: The sample was selected randomly.
10% Condition: The sample is certainly less than 10% of the population.
Success/Failure Condition:

\[n\hat{p} = (1010)*(0.63) = 636 \geq 10\]

\[n\hat{q} = (1010)*(0.37) = 374 \geq 10\]

The conditions are satisfied so a one-proportion z-interval using the Normal model is appropriate.

Example 5 (cont.)

What can we conclude about the proportion of all French adults who sympathize?

\[n = 1010, \hat{p} = 0.63\]

\[\sigma(\hat{p}) = \sqrt \frac{(0.63)(0.37)}{1010} = 0.015\]

For a 95% CI, $z^* = 1.96$, so

\[ME = z^* \sigma(\hat{p}) = 1.96(0.015) = 0.029\]

\[0.63 \pm 0.029\]

Based on the survey we can be 95% confident that between 60.1% and 65.9% of all French adults were sympathetic.

Choosing the Sample Size

To get a narrower confidence interval without giving up confidence, we must choose a larger sample.

\[ME = z^* \sqrt \frac{\hat{p}\hat{q}}{n}\]

Thus,

\[n \geq (z^*)^2 \frac{\hat{p}(1-\hat{p})}{ME^2}\]

\[n \geq 0.25\left(\frac{z^*}{ME}\right)^2\]

Example 6

Suppose a company wants to offer a new service and wants to estimate, to within 3%, the proportion of customers who are likely to purchase this new service with 95% confidence. How large a sample do they need?

We proceed by guessing the worst case scenario for $\hat{p}$. We guess $\hat{p}$ is 0.50 because this makes the SD (and therefore n) the largest.

\[n = (1.96)^2 \frac{(0.5)(0.5)}{0.03^2} = 1067.1\]

We can conclude that the company will need at least 1068 respondents to keep the margin of error as small as 3% with confidence level 95%.

Confidence Intervals for Means

CI for the Population Mean

Let us construct a confidence interval for the population mean $\theta = E(X) = \mu $

It's estimator is $\hat{\theta} = \bar{X}$,

If a sample $X$ comes from Normal distribution, then $\bar{X}$ is also Normal and we proceed with construction of CI.
If a sample comes from any distribution, but the sample size $n$ is large, then $\bar{X}$ approximately Normal distribution according to CLT, and we proceed with construction of CI.

So for the population mean and known $\sigma$, $\sigma(\hat{\theta}) = \sigma/ \sqrt{n}$ $CI = \bar{X} \pm z_{\alpha/2}\frac{\sigma}{\sqrt n}$

CI for for Two Means Difference

To construct a confidence interval for the difference between population means, $\theta = \mu_X - \mu_Y$.

Propose an estimator of $\theta$, $\hat{\theta} = \bar{X} - \bar{Y}$
Check that $\hat{\theta}$ is unbiased.
Check that $\hat{\theta}$ is a Normal or approximately Normal distribution.
Find the standard error of $\hat{\theta}$ (using independence)

\[\sigma(\hat{\theta}) = \sqrt{Var(\bar{X} - \bar{Y})} = \sqrt{Var(\bar{X}) + Var(\bar{Y})} = \sqrt{\frac{\sigma_X}{n} + \frac{\sigma_Y}{m}}\]

\[CI = \bar{X} - \bar{Y} \pm z_{\alpha/2}\sqrt{\frac{\sigma_X}{n} + \frac{\sigma_Y}{m}}\]

The Sampling Distribution for the Mean

Because the true value of the population standard deviation $\sigma$ is unknown.
Instead of $\sigma$, we will use $s$, the sample standard deviation from the data. So,

\[\sigma(\bar{\theta}) = \frac{s}{\sqrt n}\]

Confidence intervals means will be

\[\bar{\theta} \pm ME\]

where the $ME$ was equal to a critical value, $z^*$, times standard error $\sigma(\bar{\theta})$.

Gosset's t

William S. Gosset discovered above when he used the standard error $\frac{s}{\sqrt n}$ the shape of the curve was no longer Normal.

New model was called the Student's t, and it is always bell-shaped, but the details change with the sample sizes.

The Student's t-models form a family of related distributions depending on a parameter known as degrees of freedom.

t-dist

Example 7

Data from a survey of 25 randomly selected customers found a mean age of 31.84 years and the standard deviation was 9.84 years.

What is the standard error of the mean?

How would the standard error change if the sample size had been 100 instead of 25? (Assume that $s$ = 9.84 years.)

Practical sampling distribution model for means

When unknown standard error of $\bar{\theta}$ is replaced by its estimator

\[s(\bar{\theta}) = \frac{s}{\sqrt n}\]

The standardized sample mean,

\[t = \frac{\bar{\theta} - \theta}{s(\bar{\theta})}\]

no longer has a Normal distribution!

We need to use a Student's $t$-distribution with $n-1$ degrees of freedom.

One-sample t-interval

When the assumptions and conditions are met, the confidence interval for the population mean, $\mu$ is:

\[\bar{\theta} \pm t^*_{n-1} \frac{s}{\sqrt n}\]

The critical value $t^*_{n-1}$ depends on the particular confidence level, $\alpha$, that you specify and on the number of degrees of freedom, $n-1$, which we get from the sample size.

Finding t-values

For example, suppose we’ve performed a one-sample t-test with 19 df and a critical value of 1.639, and we want the upper tail P-value.

t-dist

From the table, we see that 1.639 falls between 1.328 and 1.729. All we can say is that the P-value lies between P-values of these two critical values, so 0.05 < P < 0.10.

Example 8

Data from a survey of 25 randomly selected customers found a mean age of 31.84 years and the standard deviation was 9.84 years.

Construct a 95% confidence interval for the mean. Interpret the interval.

Example 8 (cont.)

Construct a 95% confidence interval for the mean.

\[\bar{\theta} \pm t^*_{n-1} \times \sigma(\bar{\theta}) = 31.84 \pm (2.064)(1.968) =\]

\[31.84 \pm 4.062 = (27.78, 35.90)\]

Interpret the interval.

We're 95% confident the true mean age of all customers is between 27.78 and 35.90 years.

Assumptions and Conditions

Independence Assumption: There is no way to check independence of the data, but we should think about whether the assumption is reasonable.
Randomization Condition: The data arise from a random sample or suitably randomized experiment.
10% Condition: The sample size should be no more than 10% of the population. For means our samples generally are, so this condition will only be a problem if our population is small.
Nearly Normal Condition: The data come from a distribution that is unimodal and symmetric. This can be checked by making a histogram.

Normal Population Assumption

For very small samples (n < 15), the data should follow a Normal model very closely. If there are outliers or strong skewness, t-methods shouldn’t be used.
For moderate sample sizes (n between 15 and 40), t-methods will work well as long as the data are unimodal and reasonably symmetric.
For sample sizes larger than 40 or 50, t-methods are safe to use unless the data are extremely skewed. If outliers are present, analyses can be performed twice, with the outliers and without.

Example 9

In 25 randomly selected customers survey found a mean age of 31.84 years and the standard deviation was 9.84 years. A 95% confidence interval for the mean is (27.78, 25.90).

Independence: Data were gathered from a random sample and should be independent.
10% Condition: These customers are fewer than 10% of the customer population.
Nearly Normal: The histogram is unimodal and approximately symmetric.