#
G14CST编程辅导、辅导Python，Java程序

G14CST/MATH4007 —Computational

Statistics

Chapter II - Simulation

2 Simulation

By simulation, we mean the use of a model to produce artificial ‘results’

rather than obtaining them from a physical experiment. When should we

use stochastic models as opposed to deterministic ones? A lack of knowledge

about a deterministic system is often accounted for by adding some random-

ness to the models. Whenever we have a model, we can simulate from the

model to learn about it. Note however, that we learn about the model, not

the physical system!

In order to simulate from a stochastic model, we need a source of random-

ness. Most simulation methods require a sequence of uniformly distributed

random variables as the starting point from which more complicated simula-

tions can be derived. How could we generate U [0, 1] random variables? The

approach is to generate a sequence of pseudo-random numbers, which are

generated deterministically, but appear indistinguishable from random when

statistical tests are applied.

Definition: A sequence of pseudo-random numbers {ui} is a determin-

istic sequence of numbers in [0, 1] having the same statistical properties as a

similar sequence of random numbers.

Note that the sequence {ui} is reproducible provided the first term in the

sequence is known (which is known as the seed). A good sequence would be

“unpredictable to the uninitiated”. That is, if we were only shown the se-

quence produced, and didn’t know the deterministic formula used to produce

it, we would not be able to predict its future behaviour, and the sequence

would appear to be completely random.

1

2.1 Congruential generators

The following sequence

0.155109197, 0.701655716, 0.813951522, 0.568807691, 0.087282449,

was generated by starting with the number N0 = 78953536 then calculating

N1 = (aN0) mod M , N2 = (aN1) mod M , N3 = (aN2) mod M , and then

setting U1 = N1/M , U2 = N2/M , U3 = N3/M , etc.

The two constants are

a = 216 + 3 = 65539, M = 231 = 2147483648.

N1 = remainder when 65539 × 78953536 is divided by 2147483648. Then

U1 = N1/2147483648.

The general form of a congruential generator is

Ni = (aNi?1 + c) mod M,

Ui = Ni/M, where integers a, c ∈ [0,M ? 1]

If c = 0, it is called amultiplicative congruential generator (otherwise, mixed).

N0 is called the seed.

The numbers produced are restricted to the M possible values.

Clearly, they are rational numbers, but if M is large they will practically

cover the reals in [0, 1]. As soon as some Ni repeats, say, Ni = Ni+T , then

the whole subsequence repeats, i.e. Ni+t = Ni+T+t, t = 1, 2, . . .. The least

such T is called the period.

A good generator will have a long period. The period cannot be longer

than M and also depends on a and c.

2

2.2 Generation from non-U(0, 1)

Now suppose we have a sequence U1, U2, U3, . . . of independent uniform ran-

dom numbers in [0, 1], and we want to generate X1, X2, . . . distributed inde-

pendently and identically from some other specified distribution. That is, we

wish to transform the U1, U2, . . . sequence into a X1, X2, . . . sequence. There

are often many ways of doing this. A good, efficient algorithm should be fast

because millions of random numbers from the desired distribution may be

required.

2.2.1 The inversion method

Let X be any continuous random variable and define Y = FX(X), where FX

is the distribution function of X : FX(x) = P (X ≤ x).

Claim: Y ～ U [0, 1].

Proof: Y ∈ [0, 1] and the distribution function of Y is

FY (y) = P (Y ≤ y) = P

(

FX(X) ≤ y

)

= P

(

X ≤ F?1X (y)

)

= FX

(

F?1X (y)

)

= y

which is the distribution function of a uniform random variable on [0, 1].

So, whatever the distribution of X , Y = FX(X) is uniformly distributed

on [0, 1].

The inversion method turns this backwards. Let U = FX(X), so that

X = F?1X (U). To generate from FX(·), take a single uniform variable U ,

calculate F?1X (U), and then X will have the required distribution:

P (X ≤ x) = P (F?1X (U) ≤ x)

= P

(

U ≤ FX(x)

)

= FX(x)

3

space

Example: exponential distribution

Let X have the exponential distribution with parameter λ (mean 1λ). I.e.

f(x) = λe?λx (x > 0)

F (x) =

∫ x

0

λe?λz dz = [?e?λz]x0 = 1? e?λx.

Set U = 1? e?λX and solve for X

X = ?1

λ

ln(1? U).

Note that 1 ? U is uniformly distributed on [0, 1], so we might just as well

use

X = ?1

λ

lnU.

2.2.2 Discrete distributions

The inversion method works for discrete random variables in the following

sense. Let X be discretely distributed with k distinct possible values xi

having probabilities pi, i = 1, . . . , k. So

P (X = xi) = pi,

k∑

i=1

pi = 1.

Then FX(x) =

∑

xi≤x

pi is a step function.

Inversion gives X = xi if

∑

xjpj < U ≤

∑

xj≤xi

pj, which selects each

possible value of X with the correct probability.

Think of this as splitting [0, 1] into intervals of length pi. The interval in

which U falls is the value of X , which occurs with probability pi as required.

4

Example

Let X ～ Bi(4, 0.3). The probabilities are

P (X = 0) = 0.2401, P (X = 1) = 0.4116, P (X = 2) = 0.2646

P (X = 3) = 0.0756, P (X = 4) = 0.0081.

The algorithm says X = 0 if 0 ≤ U ≤ 0.2401,

X = 1 if 0.2401 < U ≤ 0.6517,

X = 2 if 0.6517 < U ≤ 0.9163,

X = 3 if 0.9163 < U ≤ 0.9919,

X = 4 if 0.9919 < U ≤ 1.

One way to perform the algorithm is this: let U ～ U(0, 1).

1. Test U ≤ 0.2401. If true, return X = 0.

2. If false, test U ≤ 0.6517. If true, return X = 1.

3. If false, test U ≤ 0.9163. If true, return X = 2.

4. If false, test U ≤ 0.9919. If true, return X = 3.

5. If false, return X = 4.

Consider the speed of this. The expected number of steps (which roughly

equates to speed) is

1× 0.2401 + 2× 0.4116 + 3× 0.2646 + 4× 0.0756 + 4× 0.0081 = 2.1919.

5

To speed things up we can rearrange the order so that the later steps are

less likely.

1. Test U ≤ 0.4116. If true return X = 1.

2. If false, test U ≤ 0.6762. If true return X = 2.

3. If false, test U ≤ 0.9163. If true return X = 0.

4. and 5. as before.

The expected number of steps is 1× 0.4116 + 2× 0.2646 + 3× 0.2401 + 4×

(0.0756 + 0.0081) = 1.9959: Roughly a 10% speed increase.

2.2.3 Transformations

Here are a few useful relationships which can be used to transform simulated

random variables between related distributions:

(a) If U ～ U(0, 1) set V = (b? a)U + a then V ～ U(a, b) where a < b.

(b) If Yi are iid exponential with parameter λ then

pi = 1 and each fi is a density, then we can sample from f

by first sampling an index j from the set {1, . . . , r} with correspond-

ing probability distribution p = {p1, . . . , pr}, and then taking a sample

from fj .

f(x) is a mixture density (we are “mixing” (averaging) the component

densities with weights given by the probability distribution of the in-

dex).

2.2.4 The Box-Mu¨ller algorithm for the normal distribution

We cannot generate a normal random variable by inversion, because FX is

not known in closed form (nor is its inverse). The Box–Mu¨ller method (1958)

can be used instead. Take U1, U2 ～ U(0, 1) independently. Calculate