Computer Simulations

Computer Simulations and Monte Carlo Methods


Simulation of Random Variables

See notebook examples.


Arbitrary Discrete Distribution

  1. Divide the interval $[0, 1]$ into subintervals,

\[A_0 = [0, p_0)\]
\[A_1 = [p_0, p_0+p_1)\]
\[A_2 = [p_0+p_1, p_0+p_1+p_2)\]
\[etc.\]
  1. Obtain a Standard Uniform random variable from a random number generator.

  2. If $U$ belongs to $A_i$, let $X = x_i$.

\[P \{X = x_i \} = P \{U \in A_i \} = p_i\]

Inverse Transform Method

Theorem: Let $X$ be a continuous random variable with cdf $F_X(x)$. Define a random variable $U = F_X(X)$. The distribution of $U$ is $Uniform(0,1)$.

Proof: First, $\forall x \; 0 \leq F(x) \leq 1$, therefore, values of $U$ lie in [0,1]. Second, for any $U u \in [0,1]$, find the cdf of $U$, as follows $F_U(u) = P \{U \leq u\} = P \{F_X(X) \leq u\}$ $= P \{X \leq F_X^{-1}(u)\} = F_X(F_X^{-1}(u)) = u$

In case if $F_X$ is not invertible, let $F_X^{-1}(u)$ be the smallest $x$ s.t. $F_X(x)=u$.

So, $U$ has cdf $F_U(u) = u$ and density $f_U(u) = F'_U(u) = 1$ for $0 \leq u \leq 1$. Thus, $U$ is $Uniform(0,1)$.


Arbitrary Continuous Distribution

  1. Obtain a Standard Uniform random variable from a random number generator.

  2. Compute $X = F^{-1}(U)$. In other words, solve the equation $F(X) = U$ for $X$.


Rejection Method

Theorem: Let a pair $(X, Y)$ have Uniform distribution over the region $A = \{(x, y) | 0 \leq y \leq f (x) \}$ for some density function $f$. Then $f$ is the density of $X$.

Algorithm

  1. Find such numbers $a$, $b$, and $c$ s.t. $0 \leq f(x) \leq c$ for $a \leq x \leq b$.

  2. Obtain Standard Uniform random variables U and V.

  3. Define $X = a + (b - a)U$ and $Y = cV$.

  4. If $Y > f(X)$, reject the point and return to step 2. If $Y \leq f(X)$, then $X$ is the desired random variable having the density $f (x)$.


Change-of-units Transformation.

Let $X$ be a continuous random variable with distribution function $F_X$ and probability density function $f_X$. If we change units to $Y = rX +s$ for real numbers $r > 0$ and $s$, then $F_Y(y) = F_X(\frac{y-s}{r})$ and $f_Y(y) = \frac{1}{r}f_X(\frac{y-s}{r})$


Example 1

Let $X$ be a random variable with an $N(\mu, \sigma^2)$ distribution, and let $Y = rX + s$. Then

\[f_Y(y) = \frac{1}{r}f_X(\frac{y-s}{r}) = \frac{1}{r \sigma \sqrt{2\pi}} e^{-\frac{1}{2}((y-r\mu-s)/r\sigma)^2}\]

So for any $r \neq 0$ and $s$, the random variable $rX+s$ has an $N(r\mu+s, r^2\sigma^2)$ distribution.


Jensen's inequality

Without actually computing the distribution of $g(X)$ we can often tell how $E[g(X)]$ relates to $g(E[X])$.

\[E[g(X)] = E[rX + s] = rE[X] + s = g(E[X])\]

Jensen's inequality: Let $g$ be a convex function, and let $X$ be a random variable. Then

\[g(E[X]) \leq E[g(X)]\]

Example 2

Let $X$ be a random variable with $Var(X) > 0$. Which is true:


Monte Carlo Methods

Monte Carlo methods vary, but tend to follow a particular pattern:

  1. Define a domain of possible inputs.

  2. Generate inputs randomly from a probability distribution over the domain.

  3. Perform a deterministic computation on the inputs.

  4. Aggregate the results.


Estimating Probabilities

For a random variable $X$, the probability $p = P\{X \in A\}$ is estimated by $\hat{p} = \hat{P} \{X \in A\} = \frac{\text{number of } X_1, \ldots,X_N \in A}{N}$

where $N$ is the size of a Monte Carlo experiment, $X_1, \ldots,X_N$ are generated random variables with the same distribution as $X$.


Accuracy of a Monte Carlo Study

Since the number of $X_1, \ldots,X_N$ that fall within set $A$ has $Binomial(N, p)$ distribution with expectation $Np$ and variance $Np(1-p)$, we obtain

\[E(\hat{p}) = \frac{1}{N}(Np) = p\]
\[Std(\hat{p}) = \frac{1}{N} \sqrt{Np(1-p)} = \sqrt{\frac{p(1-p)}{N}}\]

Accuracy of a Monte Carlo Study (cont.)

We can assess the accuracy of our results.

For large $N$, we can use Normal approximation of the Binomial distribution of $N\hat{p}$, $\frac{N\hat{p}-Np}{\sqrt{Np(1-p)}} = \frac{\hat{p}-p}{\sqrt{p(1-p)/N}} \approx Normal(0,1)$

\[P\{|\hat{p}-p| > \varepsilon\} = P \left \{ \frac{|\hat{p}-p|}{\sqrt{p(1-p)/N}} > \frac{\varepsilon}{\sqrt{p(1-p)/N}} \right \} \approx\]
\[\approx 2\Phi \left( -\frac{\varepsilon \sqrt N}{\sqrt{p(1-p)}} \right)\]

Accuracy of a Monte Carlo Study (cont.)

In order to guarantee that

\[P\{|\hat{p}-p| > \varepsilon\} \leq \alpha\]

one needs to simulate

\[N \geq p^*(1-p^*) \left (\frac{z_{\alpha/2}}{\varepsilon} \right )^2\]

random variables where $p^*$ is a preliminary estimator of $p$, or

\[N \geq 0.25 \left (\frac{z_{\alpha/2}}{\varepsilon} \right )^2\]

random variables, if no such estimator is available.


Estimating Means and Standard Deviations

We generate a Monte Carlo sequence of random variables $X_1, \ldots, X_N$ and compute the necessary long-run averages.

The mean $E(X)$ is estimated by the average $\bar{X} = \frac{1}{N}(X_1+ \ldots + X_N)$

If the distribution of$X_1, \ldots, X_N$ has mean $\mu$ and standard deviation $\sigma$,

\[E(\bar{X}) = \frac{1}{N}(EX_1+ \ldots + EX_N) = \mu\]
\[Var(\bar{X}) = \frac{1}{N^2}(VarX_1+ \ldots + VarX_N) = \frac{\sigma^2}{N}\]

Estimating Means and Standard Deviations (cont.)

\[s^2 = \frac{1}{N-1}\sum^N_{i=1}(X_i - \bar{X})\]

Example 3

A supercomputer is shared by 250 independent subscribers. Each day, each subscriber uses the facility with probability 0.3. The number of tasks sent by each active user has Geometric distribution with parameter 0.15, and each task takes a Gamma(10, 3) distributed computer time (in minutes). Tasks are processed consecutively.