MAST20006/MAST90057 – Module 2. Discrete Distributions
Module 2. Discrete Distributions
Chapter 2 in the textbook
Sophie Hautphenne and Feng Liu
The University of Melbourne
2023
1/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Overview
1 Discrete random variables
2 Mathematical expectation
3 Mean, variance and standard deviation
4 Bernoulli trials and the binomial distribution
5 The moment-generating function
6 The Poisson distribution
2/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
1. Discrete random variables
Recall that a fundamental objective of probability theory is to find
the probability of a given event B in the sample (outcome) space S.
It can be difficult to describe and analyse S, and accordingly B, if
the elements of S are not numerical.
However, one often deals with situations where one can associate
with each sample point (outcome) s in S a numerical measurement
x ; that makes life easier.
The numeric measurement x, when regarded as a function of sample
point s, is called a random variable, and is denoted as X or X(s).
3/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Definition 1
Given a random experiment with an outcome space S, a function X that
assigns to each element s in S a real number X(s) = x is called a
random variable (abbr. r.v.).
The range (or space) of X is the set of real numbers
{x : X(s) = x, s ∈ S}, where ‘s ∈ S’ means the element s belongs to the
set S.
Remarks : The range of X is often denoted as X(S) or SX .
Now each event (subset) B in S can be described by the subset
A := X(B) of real numbers assumed by some function (r.v.) X on
B.
Note that A is a subset of SX but not of S, and that X(B) does
not specify B for a general X.
4/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
So X : S → SX ? R, such that s 7→ X(s) = x, and for every
A ? SX , there exists B ? S such that
A = X(B) = {x : x = X(s), s ∈ B}
and therefore B = {s ∈ S : X(s) ∈ A}.
Namely, for A ? SX ,
PX(A) = P (X ∈ A) = P ({s ∈ S : X(s) ∈ A}) = P (B)
In particular,
PX(SX) = P (X ∈ SX) = P ({s : s ∈ S : X(s) ∈ SX}) = P (S) = 1.
i.e. the probability of the range of X equals 1.
5/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Assigning probability to A = X(B) ∈ SX can be easier than
assigning probability to B ∈ S, as A is of numerical nature, while B
is not necessarily numerical.
Difficulties still remain :
1 How to assign a probability to a subset A = X(B) ∈ SX ?
2 How to define a r.v. X as a function of s ∈ S ?
The response to 2) is determined by the problem under
consideration, and is not unique.
To answer 1) we will focus on the discrete sample space at this
stage.
If S is discrete, SX is also discrete. So we would be able to calculate
PX(A) for any subset A in SX if we have assigned a probability
for each element in SX .
(Remember there exists a B ? S such that A = X(B).)
6/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Specifically,
PX(A) = PX(X(B)) = P (B) =
∑
s∈B
P ({s}).
Also note
PX(A) =
∑
x∈A
PX(x) =
∑
x∈A
P (X = x).
Example 1. A marble is selected at random from a box containing 3 red,
4 yellow and 5 white marbles. The colour of the selected marble is
recorded.
The sample space is S = {R, Y,W}
And P ({R}) = 3
12
, P ({Y }) = 4
12
, P ({W}) = 5
12
.
Define a random variable
X = X(s) =
?????
1 if s = R,
2 if s = Y,
3 if s =W.
7/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Then the space of X is SX = {1, 2, 3}.
For A = {1, 2} which is an event in SX , there exists an event B in
S where B = {R, Y } such that
X(B) = X({R, Y }) = {X(R), X(Y )} = {1, 2} = A.
Note that both A and B represent the event that the selected
marble is not white.
Now,
PX(A) = PX({1, 2}) = PX(1) + PX(2) = P (X = 1) + P (X = 2)
= P (s = R) + P (s = Y ) = P ({R, Y }) = P (B)
=
3
12
+
4
12
=
7
12
Carefully read the above equation to make sure you understand
every step there.
The preceding discussions tell us that the set of probabilities
{P (X = x), x ∈ SX} are fundamental in that they determine the
probability of any event in SX .
8/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
We often write
f(x) := PX({x}) = P (X = x) for any x ∈ SX ;
we call f(x) the probability mass function (pmf) of X.
Definition 2
The pmf f(x) of a discrete random variable X is a function that satisfies
the following properties :
1 f(x) > 0 for any x ∈ SX ;
2
∑
x∈SX
f(x) = 1 ;
3 PX(A) = P (X ∈ A) =
∑
x∈A
f(x), for any A ? SX .
9/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Remarks :
1 Provided that no confusion will be created, SX can simply be
rewritten as S (the sample space for X), and PX as P (or even just
Pr).
2 Note that P (X = x) = 0 if x /∈ SX . Therefore we define f(x) = 0
for any x /∈ SX .
3 If f(x) is constant on SX , we say X has a uniform distribution, or
f(x) is a uniform pmf. For example, “f(x) = 1/6, x = 1, 2, . . . , 6”
is a uniform pmf.
4 The pmf f(x) can be expressed in different ways. It can be
expressed as either a mathematics formula, table, bar graph or
probability histogram. You can use any one (usually the simplest
one, for the given situation) of these four forms to express the pmf.
10/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Example 2. Roll a four-sided die twice.
Let a random variable X equal the larger of the two face numbers
appeared if they are different and the common value if they are the
same.
Thus the sample space is
S = {(d1, d2) : d1 = 1, 2, 3, 4; d2 = 1, 2, 3, 4}.
We have X = X(d1, d2) = max(d1, d2), and the space of X is
SX = {1, 2, 3, 4}.
It is not difficult to see that
P (X = 1) = P ({(1, 1)}) = 1
16
P (X = 2) = P ({(1, 2), (2, 1), (2, 2)}) = 3
16
P (X = 3) = P ({(1, 3), (2, 3), (3, 3), (3, 1), (3, 2)}) = 5
16
P (X = 4) = P ({(1, 4), (2, 4), (3, 4), (4, 4), (4, 1), (4, 2), (4, 3)}) = 7
16
11/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Therefore, the pmf of X can either be given by the following table
x 1 2 3 4
f(x) = P (X = x)
or by the following mathematical formula
f(x) = P (X = x) = , x = 1, 2, 3, 4,
or by the following bar graph or probability histogram :
12/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
R commands used for creating the above graphs :
13/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Example 3 : The hypergeometric distribution.
Let X be the number of “defective items” (“D”) in a sample of n items
randomly drawn without replacement from a population consisting of
N1 D’s and N2 G’s (“good items”). The population has in total
N1 +N2 = N items.
Assume that each item in the population has the same chance to be
drawn.
Then the possible values that the discrete r.v. X can take, i.e. the
space of X, are SX = {x : x ≥ 0, x ≤ n, x ≤ N1 and n? x ≤ N2}.
We say X has a hypergeometric distribution Hyper(N1, N2, n), with
the pmf being.
14/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Discrete random variables
Example 4 : Capture-recapture experiment. Ten animals of a certain
species have been captured, tagged, and released to mix into their
population. Suppose the population consists of 80 such animals. A new
sample of 15 animals is to be selected.
What is the probability that 3 in the new sample will come from the
tagged ?
Let X be the number of tagged animals in the new sample.
Then X has a hypergeometric distribution
Hyper(N1 = , N2 = , n = )
Therefore f(3) = P (X = 3) =
In R, use dhyper(x,N1, N2, n) to compute the hypergeometric pmf :
15/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
2. Mathematical expectation
The pmf f(x), x ∈ SX provides all the information about the
probability distribution of a random variable X.
Here we are interested in some numeric characteristics of X, which
are also numeric characteristics of f(x)
An important numeric characteristic is the mathematical
expectation of X.
16/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Example 5. A young man devises a game. The game is to let the
participant cast a fair die and then receive a payment according to the
outcome :
He pays 1¢ if the event A = {1, 2, 3} occurs ; 5¢ if B = {4, 5}
occurs ; and 35¢ if C = {6} occurs.
It is easy to see that P (A) =
3
6
, P (B) =
2
6
and P (C) =
1
6
.
The average payment per cast is 1× 3
6
+ 5× 2
6
+ 35× 1
6
= 8¢.
In the long run, this is how much is paid in one play (use the “long
term relative frequency” interpretation of probability !).
The charge per cast should be more than 8¢ if the young man
wants to make a profit from this game over the long term.
17/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
The above discussion can be formulated more formally :
Let X be the outcome of a cast.
The pmf of X is the uniform one given by f(x) =
1
6
, x = 1, 2, . . . , 6.
In terms of the observed value x, the payment per cast is given by
the function
u(x) =
?????
1, x = 1, 2, 3
5, x = 4, 5
35, x = 6.
The mathematical expectation of the payment per cast is then
equal to
6∑
x=1
u(x)f(x) = 1× 1
6
+ 1× 1
6
+ 1× 1
6
+ 5× 1
6
+ 5× 1
6
+ 35× 1
6
= 1× 3
6
+ 5× 2
6
+ 35× 1
6
= 8.
18/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Definition 3
Suppose f(x) is the pmf of a discrete random variable X with range SX ,
and u(X) is a function of X (note that u(X) is also a r.v.).
If the summation∑
x∈SX
u(x)f(x), which is sometimes written as
∑
SX
u(x)f(x), exists,
then the sum is called the mathematical expectation or the expected
value of the function u(X), and it is denoted by E[u(X)].
That is,
E[u(X)] =
∑
x∈SX
u(x)f(x).
19/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Remarks :
1 It is possible that E[u(X)] is different from u(X) for any x ∈ SX .
2 To be mathematically rigorous, the definition of E[u(X)] requires
that
∑
x∈SX |u(x)|f(x) converges and is finite (if SX is infinite, this
is a series).
3 There is another way to calculate E[u(X)] :
(a) Define Y = u(X) ; Y is also a random variable.
(b) Then find the pmf of Y ,
i.e. g(y) := P (Y = y) = P (u(X) = y) = P (X = u?1(y)).
(c) Then E[u(X)] = E[Y ] =
∑
y∈SY
yg(y).
(d) So
∑
x∈SX
u(x)f(x) =
∑
y∈SY
yg(y).
20/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Example 6. Let a r.v. X have the pmf f(x) = 13 , x ∈ S = {?1, 0, 1}.
Let u(X) = X2.
Then
E[u(X)] = E[X2] =
∑
x∈S
x2f(x) = (?1)2× 1
3
+02× 1
3
+12× 1
3
=
2
3
.
On the other hand, we can define Y = X2.
Then P (Y = 0) = P (X = 0) = 13 , and
P (Y = 1) = P (X = ?1) + P (X = 1) = 23 .
So the pmf of Y is
g(y) =
???
1
3 , y = 0
2
3 , y = 1,
and the space of Y is SY = {0, 1}.
Hence E[Y ] =
∑
y∈SY
y g(y) = 0× 1
3
+ 1× 2
3
=
2
3
.
21/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
In conclusion, we saw that there are two ways to compute E[u(X)],
and that in Example 6, E[u(X)] = E[Y ] = 23 .
Some useful properties about the mathematical expectation :
Theorem 1
When it exists, the mathematical expectation E satisfies the following
properties :
(a) If c is a constant, E(c) = c.
(b) If c is a constant and u is a function, E[c u(X)] = cE[u(X)].
(c) If c1 and c2 are constants and u1 and u1 are functions, then
E[c1 u1(X) + c2 u2(X)] = c1E[u1(X)] + c2E[u2(X)].
(d) Generalising part (c) above : E
[
k∑
i=1
ci ui(X)
]
=
k∑
i=1
ciE[ui(X)].
22/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Example 7. Let X have the pmf f(x) =
x
10
, x = 1, 2, 3, 4. Then
E(X) =
E(X2) =
E[X(5?X)] =
23/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Example 8. Let u(x) = (x? b)2, where b is an unknown constant.
Suppose E[(X ? b)2] exists. Find the value of b for which E[(X ? b)2]
is minimal.
First write g(b) = E[(X ? b)2]
Then g′(b) =
Set g′(b) = 0 and solve for b. It follows that b =
Since g′′(b) = , E[X] is the value of b that minimizes
E[(X ? b)2]
That is, E[(X ? E(X))2] ≤ E[(X ? b)2] for any b.
24/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mathematical expectation
Example 9 : The expectation of a hypergeometric random variable.
Let X have a hypergeometric distribution Hyper(N1, N2, n), with the
pmf given by
f(x) = P (X = x) =
(
N1
x
)(
N2
n?x
)(
N
n
) ,
where x ≥ 0, x ≤ n, x ≤ N1, n? x ≤ N2.
Then we can show that
E(X) =
∑
x∈S
x×
(
N1
x
)(
N2
n?x
)(
N
n
) = nN1
N
.
This agrees with the intuition : the number of ‘defective’ items in
the sample is expected to be equal to the sample size n multiplied
with
N1
N
, the proportion of ‘defective’ items in the population.
25/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
3. Mean, variance and standard deviation
For a discrete r.v. X with pmf f(x) and space
SX = {u1, u2, . . . , uk}, the expectation is
E(X) =
∑
x∈SX
xf(x) = u1f(u1) + u2f(u2) + . . . + ukf(uk).
The expectation can be regarded as a weighted mean of
u1, u2, . . . , uk, where the weights are f(u1), f(u2), . . . , f(uk).
For this reason, we also call E(X) the mean of the random variable
X, and also denote E(X) by the Greek letter μ.
In summary,
μ := E(X) =
∑
x∈SX
xf(x) = u1f(u1) + u2f(u2) + . . . + ukf(uk).
A third name for E(X) is the first moment of X as the expression
of E(X) has an interpretation of moment in mechanics.
26/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Similarly we call E(X2) the second moment of X.
Generally, for k ≥ 1, we call E(Xk) the k-th moment of X (about
the origin).
E[(X ? μ)k] is called the k-th moment of X about the mean μ
(central moment).
Statisticians find it valuable to compute E[(X ? μ)2] (the second
moment about the mean), because
E[(X ? μ)2] =
∑
x∈SX
(x? μ)2f(x)
= (u1 ? μ)2f(u1) + (u2 ? μ)2f(u2) + . . . + (uk ? μ)2f(uk)
is the weighted mean of the squares of the differences
u1 ? μ, u2 ? μ, . . . , uk ? μ, which measures the variability of X
about its mean.
27/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
For this reason, we call E[(X ? μ)2] the variance of X (or of the
pmf of X).
We also use σ2 or Var(X) to denote the variance, i.e.
σ2 := Var(X) = E[(X ? μ)2]
We call σ :=
√
E[(X ? μ)2] the standard deviation of X (or of
the pmf of X).
The following property is useful :
σ2 = Var(X) = E[(X ? μ)2] = E[X2]? μ2 = E[X2]? (E[X])2.
28/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Example 10. Let the pmf of X be defined as f(x) =
x
6
, x = 1, 2, 3.
Then
The mean of X is μ = E(X) = 1× 1
6
+ 2× 2
6
+ 3× 3
6
=
7
3
.
The second moment of X is
E(X2) = 12 × 1
6
+ 22 × 2
6
+ 32 × 3
6
= 6.
The variance of X is
σ2 = Var(X) = E(X2)? μ2 = 6?
(
7
3
)2
=
5
9
.
The standard deviation of X is σ =
√
Var(X) =
√
5
9
= 0.745.
29/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Example 11. Suppose the pmf of X is given by
x -1 0 1
fX(x) 1/3 1/3 1/3
It is easy to find that the mean of X is μX = 0, and the variance of
X is σ2X = 2/3.
Suppose the pmf of Y is given by
y -2 0 2
fY (y) 1/3 1/3 1/3
It is easy to find that the mean of Y is μY = 0, and the variance of
Y is σ2Y = 8/3.
We see that Y = 2X, μY = 2μX , σ
2
Y = 2
2σ2X and σY = 2σX .
30/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
In general, if Y = aX + b where a and b are constants, and Y and X are
two random variables, then we have the following
a) μY = aμX + b
b) σ2Y = a
2σ2X and σY = aσX .
Example 12. If X has a discrete uniform distribution on the first m
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Example 12 (cont.).
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Example 13 : Empirical distribution, sample mean and sample variance.
Consider performing a random experiment n times which gives n
observations of a r.v. X : x1, x2, . . . , xn ; this is referred to as a sample
from the distribution of X.
It is possible that some values in the sample are the same, but we do
not worry about it at this time.
Often we don’t know the probability distribution of X. But we can
(artificially) assign a probability 1n to each of x1, x2, . . . , xn . The
distribution determined by these equal probabilities is called the
empirical distribution since it is determined by a particular sample
x1, x2, . . . , xn acquired in an experiment.
That is, the pmf for the empirical distribution is
femp(x) =
1
n
, x = x1, x2, . . . , xn.
33/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
The mean of femp(x) is
n∑
i=1
xifemp(xi) =
which is just the sample mean of the data x1, x2, . . . , xn.
Likewise, the variance of the empirical distribution is
times the sample variance of the data defined as
s2 :=
1
n? 1
n∑
i=1
(xi ? xˉ)2.
This example shows us the relationship between the mean and
variance of the empirical distribution and the sample mean and
sample variance of the data.
34/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Mean, variance and standard deviation
Example 14 : The mean and variance of a hypergeometric distribution.
Let X have a hypergeometric distribution Hyper(N1, N2, n), with the pmf
where x ≥ 0, x ≤ n, x ≤ N1, n? x ≤ N2.
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
4. Bernoulli trials and the binomial distribution
A random experiment with the below properties is called a binomial
experiment :
1 Each such experiment consists of n trials, with n being fixed in advance.
2 Each of the n trials has only two possible outcomes which are denoted by
‘success’ (S) and ‘failure’ (F). A trial of this type is called a Bernoulli
trial.
3 The n trials are independent of each other. That is, the outcome of one
trial does not affect the probability of occurrence of the outcome of other
trials.
4 The probability of ‘success’ (denoted by p) is the same for all the n trials.
36/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Let Xi be a random variable associated with the i-th Bernoulli trial,
which is defined as Xi(success) = 1 and Xi(failure) = 0.
Xi is called a Bernoulli random variable.
The pmf of Xi is given by
f(xi) = p
x
i (1? p)1?xi , xi = 0, 1,
and
μi = E(Xi) = p,
σ2i = Var(Xi) = p(1? p).
37/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Example 15. A coin is flipped independently five (n = 5) times. Call
outcome H (heads) as a “success” and T (tails) as a “failure”. Then this
is a binomial experiment.
Example 16. Suppose among 20 goblets in a box 2 have cosmetic flaws.
Now randomly take 10 goblets from the box without replacement. For
a selected goblet we are interested in whether it has any cosmetic flaws.
Then this is not a binomial experiment, because the outcomes of
the 10 trials are not independent with each other (it is a
hypergeometric experiment.)
If the 10 goblets are taken with replacement, then the experiment
is a binomial one.
38/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Example 17. Suppose 10% of a stock of 10,000 goblets have defects,
and randomly take 10 goblets without replacement for inspection. Then
the outcomes of the 10 trials are not independent of each other, but the
dependence is so weak that it can be ignored.
Therefore, Properties 1–4 of a binomial experiment are
approximately satisfied, and the experiment can be approximately
modelled by a binomial experiment.
In general, if an experiment involves a ‘without replacement’ sampling but
the sample size (number of trials) is < 5% of the population size, then
the experiment can be analysed as though it was a binomial experiment.
Example 18. A company that produces fine crystal knows from
experience that 10% of its goblets have cosmetic flaws and must be
classified as “seconds”. Now a sample of 10 goblets is randomly taken
from the production line for inspection. Knowing that the objective is just
to see whether any of them has any cosmetic flaws, this experiment is
approximately a binomial one.
39/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
In a binomial experiment, we often are interested in the total
number of ‘successes’, denoted by X, in the n Bernoulli trials.
We then call X a binomial random variable, and say that X has a
binomial distribution, denoted as X
d
= b(n, p), where n and p are
parameters indicating the number of Bernoulli trials and the
probability of ‘success’ in each trial respectively.
Note that we are not interested in the order of occurrences of the
‘successes’ for a binomial distribution.
The possible values of X are 0, 1, 2, . . . , n.
X = X1 +X2 + . . . +Xn, i.e. the sum of the n Bernoulli r.v’s.
Each Bernoulli r.v. Xi has a special binomial distribution
Xi
d
= b(1, p).
40/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Next we proceed to find the pmf and other characteristics of a binomial
r.v. X.
When n = 3, the probability for each possible outcome of X is given
below :
X Outcome Probability
3 SSS
2 SSF
SFS
FSS
1 SFF
FSF
FFS
0 FFF
41/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
From the above table we see that the pmf of b(3, p) is
P (X = 0) = (1? p)3,
P (X = 1) = 3p(1? p)2,
P (X = 2) = 3p2(1? p),
P (X = 3) = p3,
which can be equivalently expressed as
gives the number of ways of selecting x
positions for the x ‘successes’ in the n trials.
42/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
In general, the pmf for a binomial distribution b(n, p) is
f(x) = P (X = x) =
for x = 0, 1, 2, . . . , n.
Sometimes, it is of interest to find P (X ≤ x), the probability that
less than x or x ‘successes’ are obtained from n Bernoulli trials in a
binomial experiment.
We call the function defined by F (x) := P (X ≤ x) the cumulative
distribution function (or simply the distribution function) of X,
abbreviated as cdf of X.
43/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
For a r.v X having a binomial distribution b(n, p), the cdf is
np(1? p).
Remark : One can use the relation between binomial and Bernoulli
r.v.’s to find that
μX = E(X) = E(X1) +E(X2) + . . . +E(Xn) = p+ . . . + p = np.
44/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Example 18. Probability bargraphs for several binomial distributions of
different n and p values are listed below :
45/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
R commands for creating the above plots :
Example 19. Suppose the probability of germination of a beet seed is
0.8, and 10 seeds are planted. Let X be the number of seeds to
germinate. Assume independence of germination of one seed from that of
another seed. Then
P (X = 8) =
That is, the probability of 8 seeds to germinate is 0.302.
46/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
P (X ≤ 8) =
8∑
k=0
(
10
k
)
(0.8)k(1? 0.8)10?k, or
P (X ≤ 8) = 1? P (X ≥ 9) =
That is, the probability of no more than 8 germinations is 0.624.
μ = E(X) = .
That is, on average 8 seeds are expected to germinate.
σ2 = Var(X) =
P (6 ≤ X < 9) = P (X < 9)? P (X ≤ 5)
=
That is, the probability of at least 6 but smaller than 9 seeds to
germinate is 0.591.
47/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
What is the probability of 3 out of the 10 seeds not to germinate ?
P (3 do not germinate) =
Alternatively, let Y be the number of non-germinations. Then
Y d = b(10, 0.2). So
P (3 do not germinate) =
Suppose there are 1000 pots and 10 beet seeds are planted in each
pot, with the probability of germination of each seed still being 0.8.
The number of germinations in each pot is to be recorded.
What will the 1000 records look like ?
48/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
These 1000 recordings would be like 1000 observations from a
b(10, 0.8) random variable.
We can use R to simulate 1000 observations, plot their histogram and
compare the histogram with the pmf of b(10, 0.8).
The heights of the dots give the pmf of X.
49/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
R commands for creating the above plot :
R commands for pmf, cdf and random number generating of binomial
distribution :
dbinom(x, size, prob)
pbinom(q, size, prob)
rbinom(n, size, prob)
Type ‘help(dbinom)’ in R for more information.
50/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Example 20 : Comparison of binomial and hypergeometric distributions.
Suppose among 200 goblets in a box 20 have defects.
1 Randomly take 30 goblets from the box with replacement. Let X
be the number of defective goblets selected. It is easy to see that
X
d
= b(n = 30, p =
20
200
= 0.1).
2 Randomly take 30 goblets from the box without replacement. Let
Y be the number of defective goblets selected. Then
Y
d
= Hyper(N1 = 20, N2 = 180, n = 30).
51/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
Example 20. (cont.)
We have learned that
MAST20006/MAST90057 – Module 2. Discrete Distributions
Bernoulli trials and the binomial distribution
A comparison of the pmf’s of X and Y is given below :
It can be shown that when n and p =
N1
N
are fixed but N tends to
be very large, the hypergeometric distribution will be very close to
relevant binomial distribution.
53/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
5. The moment-generating function
Mean, variance and standard deviation are important characteristics
of a distribution.
But it can be difficult to calculate E(X) and Var(X), e.g. when X
is binomial.
Here we introduce a function of t, called the moment-generating
function, which will help to generate the moments including mean
and variance of a distribution.
54/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Definition 4
Let X be a discrete random variable with pmf f(x) and range (or space)
S. If there is a positive number h such that
E(etX) =
∑
x∈S
etxf(x) is finite for t = ±h
(and hence for ?h < t < h), then the function of t defined by
M(t) := E(etX) (or MX(t) := E(e
tX))
is called the moment-generating function (mgf) of X (or of the
distribution of X).
55/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Example 21. Consider a random variable X with the following pmf :
x b1 b2 b3 . . .
f(x) = P (X = x) f(b1) f(b2) f(b3) . . .
The mgf of X is M(t) =
When t = 0, M(0) =
Example 22. If X has the mgf M(t) =
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Examples 21 and 22 show that the mgf can be derived from the
pmf, and vice versa.
The pmf uniquely determines the mgf, and it was proved that the
mgf also uniquely determines the pmf.
That is, the same pmf fX(x) = fY (x)? the same mgf
MX(t) =MY (t).
We see that the mgf, as the pmf, provides another tool for
describing the distribution of a r.v..
However, note that fX(x) = fY (x) or MX(t) =MY (t) does not
imply X = Y .
Another issue is that the mgf may not exist for some r.v.’s, while the
pmf always exists (for discrete r.v.’s, of course).
57/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Example 23. Suppose the mgf of X is M(t) =
1
2e
t
1? 12et
, t < ln(2).
We show how Taylor’s expansion can help to find the pmf of X.
This mgf does not have the form as given in Examples 21 and 22
which allowed us to find the pmf easily.
Note the Maclaurin’s (or Taylor’s) series expansion of (1? z)?1 is
(1? z)?1 = 1 + z + z2 + z3 + . . . , ?1 < z < 1.
Therefore,
M(t) =
et
2
(1?e
t
2
)?1 =
when e
t
2 < 1 and thus t < ln(2).
From the above expansion, P (X = x) =
So the pmf of X is f(x) =
58/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Now we proceed to see how the mgf and moments are related.
∑
x∈S
xrf(x) = E(Xr)
59/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
In particular,
μ =M ′(0) and σ2 =M ′′(0)? [M ′(0)]2
In order to make use of the above technique to find the moments of
X, the mgf M(t) needs to have a closed form instead of the
expansion form.
60/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Example 24. The pmf of the binomial distribution is known to be
f(x) = P (X = x) =
(
n
x
)
px(1? p)n?x = n!
x!(n? x)!p
x(1? p)n?x,
for x = 0, 1, 2, . . . , n.
Thus the corresponding mgf is
M(t) = E(etX) =
from the binomial expansion of (a+ b)n with a = 1? p and b = pet.
61/88
MAST20006/MAST90057 – Module 2. Discrete Distributions
The moment-generating function
Example 24 (cont.).
The first two derivatives of M(t) are
M ′(t) = n[(1? p) + pet]n?1(pet)
M ′′(t) = n(n? 1)[(1? p)+ pet]n?2(pe