Computer Simulations and Monte Carlo Methods
Simulation of Random Variables
See notebook examples.
Arbitrary Discrete Distribution
Divide the interval $[0, 1]$ into subintervals,
Obtain a Standard Uniform random variable from a random number generator.
If $U$ belongs to $A_i$, let $X = x_i$.
Inverse Transform Method
Theorem: Let $X$ be a continuous random variable with cdf $F_X(x)$. Define a random variable $U = F_X(X)$. The distribution of $U$ is $Uniform(0,1)$.
Proof: First, $\forall x \; 0 \leq F(x) \leq 1$, therefore, values of $U$ lie in [0,1]. Second, for any $U u \in [0,1]$, find the cdf of $U$, as follows $F_U(u) = P \{U \leq u\} = P \{F_X(X) \leq u\}$ $= P \{X \leq F_X^{-1}(u)\} = F_X(F_X^{-1}(u)) = u$
In case if $F_X$ is not invertible, let $F_X^{-1}(u)$ be the smallest $x$ s.t. $F_X(x)=u$.
So, $U$ has cdf $F_U(u) = u$ and density $f_U(u) = F'_U(u) = 1$ for $0 \leq u \leq 1$. Thus, $U$ is $Uniform(0,1)$.
Arbitrary Continuous Distribution
Obtain a Standard Uniform random variable from a random number generator.
Compute $X = F^{-1}(U)$. In other words, solve the equation $F(X) = U$ for $X$.
Rejection Method
Theorem: Let a pair $(X, Y)$ have Uniform distribution over the region $A = \{(x, y) | 0 \leq y \leq f (x) \}$ for some density function $f$. Then $f$ is the density of $X$.
Algorithm
Find such numbers $a$, $b$, and $c$ s.t. $0 \leq f(x) \leq c$ for $a \leq x \leq b$.
Obtain Standard Uniform random variables U and V.
Define $X = a + (b - a)U$ and $Y = cV$.
If $Y > f(X)$, reject the point and return to step 2. If $Y \leq f(X)$, then $X$ is the desired random variable having the density $f (x)$.
Change-of-units Transformation.
Let $X$ be a continuous random variable with distribution function $F_X$ and probability density function $f_X$. If we change units to $Y = rX +s$ for real numbers $r > 0$ and $s$, then $F_Y(y) = F_X(\frac{y-s}{r})$ and $f_Y(y) = \frac{1}{r}f_X(\frac{y-s}{r})$
Example 1
Let $X$ be a random variable with an $N(\mu, \sigma^2)$ distribution, and let $Y = rX + s$. Then
So for any $r \neq 0$ and $s$, the random variable $rX+s$ has an $N(r\mu+s, r^2\sigma^2)$ distribution.
Jensen's inequality
Without actually computing the distribution of $g(X)$ we can often tell how $E[g(X)]$ relates to $g(E[X])$.
For $g(x) = rx + s$ we know that $E[g(X)] = g(E[X])$
Above equality will very rarely occur for nonlinear $g$.
Jensen's inequality: Let $g$ be a convex function, and let $X$ be a random variable. Then
Example 2
Let $X$ be a random variable with $Var(X) > 0$. Which is true:
- \[E[e^{-X}] < e^{-E[X]}\]
or
- \[E[e^{-X}] > e^{-E[X]}\]
?
Monte Carlo Methods
Monte Carlo methods vary, but tend to follow a particular pattern:
Define a domain of possible inputs.
Generate inputs randomly from a probability distribution over the domain.
Perform a deterministic computation on the inputs.
Aggregate the results.
Estimating Probabilities
For a random variable $X$, the probability $p = P\{X \in A\}$ is estimated by $\hat{p} = \hat{P} \{X \in A\} = \frac{\text{number of } X_1, \ldots,X_N \in A}{N}$
where $N$ is the size of a Monte Carlo experiment, $X_1, \ldots,X_N$ are generated random variables with the same distribution as $X$.
Accuracy of a Monte Carlo Study
Since the number of $X_1, \ldots,X_N$ that fall within set $A$ has $Binomial(N, p)$ distribution with expectation $Np$ and variance $Np(1-p)$, we obtain
Because $E(\hat{p}) = p$, Monte Carlo estimator of $p$ is unbiased, so that over a long run, it will on the average return the desired quantity $p$.
- \[Std(\hat{p})\]
decreases with $N$ at the rate of $1/\sqrt N$.
Accuracy of a Monte Carlo Study (cont.)
We can assess the accuracy of our results.
For large $N$, we can use Normal approximation of the Binomial distribution of $N\hat{p}$, $\frac{N\hat{p}-Np}{\sqrt{Np(1-p)}} = \frac{\hat{p}-p}{\sqrt{p(1-p)/N}} \approx Normal(0,1)$
Accuracy of a Monte Carlo Study (cont.)
In order to guarantee that
one needs to simulate
random variables where $p^*$ is a preliminary estimator of $p$, or
random variables, if no such estimator is available.
Estimating Means and Standard Deviations
We generate a Monte Carlo sequence of random variables $X_1, \ldots, X_N$ and compute the necessary long-run averages.
The mean $E(X)$ is estimated by the average $\bar{X} = \frac{1}{N}(X_1+ \ldots + X_N)$
If the distribution of$X_1, \ldots, X_N$ has mean $\mu$ and standard deviation $\sigma$,
Estimating Means and Standard Deviations (cont.)
The estimator $\bar{X}$ is unbiased for estimating $\mu$.
Unbiased variance estimator denoted by $s^2$,
Example 3
A supercomputer is shared by 250 independent subscribers. Each day, each subscriber uses the facility with probability 0.3. The number of tasks sent by each active user has Geometric distribution with parameter 0.15, and each task takes a Gamma(10, 3) distributed computer time (in minutes). Tasks are processed consecutively.
What is the probability that all the tasks will be processed, that is, the total requested computer time is less than 24 hours?
Estimate this probability, attaining the margin of error $\pm 0.01$ with probability 0.99.