Lecture 5

Lecture 5


Outline


Normal Distribution


The 68-95-99.7 Rule

A z-score reports the number of standard deviations away from the mean.

normal

In bell-shaped distributions, about 68% of the values fall within one standard deviation of the mean, about 95% of the values fall within two standard deviations of the mean, and about 99.7% of the values fall within three standard deviations of the mean.


The Normal Distribution


Example 1

Each Scholastic Aptitude Test (SAT) has a distribution that is roughly unimodal and symmetric and is designed to have an overall mean of 500 and a standard deviation of 100.


Example 1 (cont.)

Because we’re told that the distribution is unimodal and symmetric, with a mean of 500 and an SD of 100, we’ll use a N(500,100) model.

normal-ex


Example 2

Assuming the SAT scores are nearly normal with N(500,100), what proportion of SAT scores falls between 450 and 600?

normal-ex2


Example 3

A college says it admits only people with SAT scores among the top 10%. How high an SAT score does it take to be eligible?

normal-ex3


Exmple 4

A tire manufacturer believes that the tread life of its snow tires can be described by a Normal model with a mean of 32,000 miles and a standard deviation of 2500 miles.


Normal Probability Plots

The Normal probability plot is a specialized graph that can help decide whether the Normal model is appropriate.

normal-plot1


Normal Probability Plots (cont.)

If normal probability plot shows a curve, it reveals skewness (see in the histogram).

normal-plot1


Sum of Normal Models


Exmple 5

A company that manufactures small stereo systems uses a two-step packaging process.


Exmple 5 (cont.)

Given:

Thus:


Exmple 5 (cont.)

What is the probability that packing an order of two systems takes more than 20 minutes?

\[z = \frac{20-15}{1.8} = 2.77\]
\[P(X>20) = P(Z>2.77) = 1 - P(Z \leq 2.77) = 1 - 0.9972 = 0.0028\]

Using past history to build a model, we find slightly more than a 0.3% chance that it will take more than 20 minutes to pack an order of two stereo systems.


The Normal Approximation for the Binomial

A discrete Binomial model is approximately Normal if we expect at least 10 successes and 10 failures:

\[np \geq 10, nq \geq 10\]

Example 6

Suppose the probability of finding a prize in a cereal box is 20%. If we open 50 boxes, then the number of prizes found is a Binomial distribution with mean of 10:

\[P(9.5 \leq X \leq 10.5) \approx P(\frac{9.5-10}{2.83} \leq z \leq \frac{9.5-10}{2.83}) = 0.1405\]

Sampling Distribution


The Distribution of Sample Proportions


Sampling Distribution for Proportions

\[SD(\hat{p}) = \sqrt \frac{p(1-p)}{n} = \sqrt \frac{pq}{n}\]

Sampling Distribution Model

The particular Normal model, $N(p, \sqrt \frac{pq}{n})$, is a sampling distribution model for the sample proportion.


Sampling Distribution Model (cont.)

Assumptions

Conditions


Example 7

Information on a packet of seeds claims that the germination rate is 92%. Are conditions met to answer the question,


Example 7 (cont.)

Independence: It is reasonable to assume the seeds will germinate independently from each other.

Randomization: The sample of seeds can be considered a random sample from all seeds from this producer.

10% Condition : The packet is less than 10% of all seeds manufactured.

Success/Failure Condition: $np = (0.92*160) = 147.2 > 10$ $nq = (0.05*160) = 12.8 > 10$


Example 7 (cont.)

Information on a packet of seeds claims that the germination rate is 92%. What is the probability that more than 95% of the 160 seeds in the packet will germinate?

\[N(0.92, \sqrt \frac{0.92*0.08}{160}) = N(0.92, 0.021)\]
\[z = \frac{p - \hat{p}}{SD(\hat{p})} = \frac{0.95-0.92}{0.021} = 1.428\]
\[P(\hat{p} > 0.95) = P(z > 1.428) \approx 1.0 - 0.9236 = 0.0764\]

Central Limit Theorem


Simulating the Sampling Distribution of a Mean

The results of a simulated 10,000 tosses of fair dice norm-hist


The Central Limit Theorem

Central Limit Theorem (CLT): The mean of a random sample has a sampling distribution whose shape can be approximated by a Normal model. The larger the sample, the better the approxima- tion will be.


The Central Limit Theorem (cont.)

The Central Limit Theorem doesn't talk about the distribution of the data from the sample.

The Normal model for the sampling distribution of the mean has a standard deviation

\[SD(\bar{y}) = \frac{\sigma}{\sqrt{n}}\]

where $\sigma$ is the standard deviation of the population.


Assumptions and Conditions for the Sampling Distribution of the Mean


Example 8

According to recent studies, cholesterol levels in healthy U.S. adults average about 215 mg/dL with a standard deviation of about 30 mg/dL and are roughly symmetric and unimodal. If the cholesterol levels of a random sample of 42 healthy U.S. adults is taken, are conditions met to use the normal model?


Example 8 (cont.)

Randomization: The sample is random

10% Condition: These 42 healthy U.S. adults are less than 10% of the population of healthy U.S. adults.

Large Enough Sample Condition: Cholesterol levels are roughly symmetric and unimodal so a sample size of 42 is sufficient.


Example 8 (cont.)


Example 8 (cont.)

\[\mu(\bar{y}) = \mu = 215\]
\[SD(\bar{y}) = \frac{\sigma}{\sqrt{n}} = \frac{30}{\sqrt{42}} = 4.629\]
\[z = \frac{\bar{y} - \mu}{SD(\bar{y})} = \frac{220-215}{4.629} = 1.08\]

Law of Diminishing Returns

The standard deviation of the sampling distribution declines only with the square root of the sample size. The square root limits how much we can make a sample tell about the population.