辅导ELEC2103、辅导MATLAB编程

Tutorial 5 – Probability, Statistics, nd Data Analysis
This tutorial presents some of the MATLAB functions used in probability calculations, basic statistics,
and data analysis. The discrete Markov chain model is introduced and its stationary points identified using
eigenanalysis, linking linear algebra and this class of stochastic model. Based on this, the PageRank algorithm
is discussed, which underpins Google’s websearch engine. Importantly, reading data from and writing data to
a file i s d emonstrated. Curve fitting is introduced, as is the correlation coefficient.
For a more detailed treatment of statistics and and machine learning topics, consult The Elements of
Statistical Learning: Data Mining, Inference, and Prediction, by Hastie, Tibshirani and Friedman (2009, 2nd
Ed, Springer Series in Statistics):
http://statweb.stanford.edu/~tibs/ElemStatLearn/printings/ESLII_print10.pdf
5.1 Combinatorics and Counting
Probability calculations often involve counting combinations of objects, and the study of combinations of
objects is the realm of combinatorics. Many formulas in combinatorics are derived simply by counting the
number of objects in two different w ays a nd s etting t he r esults e qual t o e ach o ther. O ften t he resulting
proof is obtained by a “proof by words.” Formulas are not necessarily derived from manipulations of factorial
functions as some students might think. Some of MATLAB’s combinatorial functions are illustrated in this
section, which are likely familiar to you.
5.1.1 Permutations and combinations
A permutation is an ordered arrangement of the objects in set S. If |S| = n, then there are:
n(n ? 1)(n ? 2) . . . (2)(1) = n!
different p ermutations o f t hese n o bjects. A n e xample i s t he s ize o f t he s et o f d ifferent or ders th at a group
n of people could appear in a queue, or that labeled balls are drawn from a bucket.
For these types of counting problems, MATLAB provides a factorial function. However, it is generally
preferable (because it is quicker and can be used for much larger numbers), to use the gamma function to
calculate factorials:
Γ (n) =
∫ ∞
0
xn?1e?xdx
and use the fact that Γ (n+ 1) = n! for positive integer values of n. For example,
f a c t o r i a l (5)
gamma(6)
ans =
120
ans =
120
More generally, a k-permutation arises when we choose k objects in order from a set of n distinct objects.
The total number of different permutations of size k of these n objects is:
n(n? 1)(n? 2) . . . (n? k + 1) = n!(n? k)!
1
Tutorial 4 ELEC2103/ELEC9103
where dividing by the number of remaining items (n?k)! truncates the factorial computation at the appropriate
point in the series.
Alternatively, we may not care about the order in which the objects are selected. There are two cases.
First, if we sample with replacement, the number of combinations is simply nk.
The second is referred to as sampling without replacement (i.e. as if we are picking balls from a bucket
and not returning them). In this case, we consider a combination of k objects dawn from n, which is written:(
n
k
)
. We can figure out the formula for
(
n
k
)
just by counting.
First, we know there are n! permutations of all of the objects; set this to the LHS of our expression. Next,
let us construct the RHS by first, counting the number of k-sized permutations of n objects by dividing the
n objects into a group of k selected objects and the remaining (n? k) objects. We know that there is a total
of k! permutations of the k selected objects, and likewise, a total of (n ? k)! permutations of the (n ? k)
remaining objects. Therefore, the total number of permutations of the n objects is:
n! =
(
n
k
)
k!(n? k)!
Rearranging this, we have: (
n
k
)
= n!
k!(n? k)!
different permutations of size k of these n objects. Note that we have derived this formula simply by
counting, not by expanding factorial functions. However, now we know the answer, we can also see that the
same expression can be found from the size of the set of k-permutations of n objects, divided by the number
of permutations of the drawn objects themselves, because we don’t care about their order:(
n
k
)
= n!
k!(n? k)! =
n(n? 1)(n? 2) . . . (n? k + 1)
k(k ? 1)(k ? 2) . . . (2)(1)
MATLAB provides the nchoosek function for calculating combinations, for example, for n = 5, k = 3:
nchoosek (5 ,3)
Exercise 1.
a) Verify that by using MATLAB’s factorial and nchoosek functions for a few example numbers, e.g.
(n=12,k=7) and (n=24,k=6).
b) Test the equation below using MATLAB’s nchoosek function and a few example numbers: (n=10, k=4),
(n=20, k=8). (
n
k ? 1
)
+
(
n
k
)
=
(
n+ 1
k
)
If you want to know how to prove the relationship above, read the following, otherwise just skip it.
Proof: By definition
(
n+ 1
k
)
, is the number of ways to choose k items out of n+1 items. We can break
the items into one group, Group A, with n items an another group, Group B, with only 1 item, and consider
the number of ways to choose items from these two groups. Assume we pick the 1 item from Group B, we
then have to choose k ? 1 items from Group A. The number of ways we can do this is clearly
(
n
k ? 1
)
. We
could also choose not to pick the 1 item in Group B, so that we have to select all items from Group A. The
number of ways we can do this is
(
n
k
)
. We must conclude then that the total number of ways to choose k
items out of n+ 1 items is the sum of these two cases, given by:
(
n
k ? 1
)
+
(
n
k
)
=
(
n+ 1
k
)
.
Page 2
Tutorial 4 ELEC2103/ELEC9103
5.2 Stochastic matrices and Markov chains
A Markov chain is stochastic discrete-time, discrete-state transition system. Markov chains are used extensive
to model systems with predictable, stochastic, state transistions, including activity models, target tracking,
regression models with mode- or regime-switching, for simulating wind and weather patterns, and in finance.
They are also used extensively as a computational routine for implementing Baysian models, which require
efficient resampling methods, an approach called the Monte Carlo Makov chain method.
A three-state Markov chain system is illustrated below:
1 2 3
Here, the three states are illustrated as circles, with chance transitions between them, as indicated by edges.
Each edge has associated with it a probability, such that at each time step, the probability of moving from
state s to state s′ is given by P (s′|s), and edges indicate transitions that have strictly positive probabilities.
Note that, in this example:
? not all states are directly linked with all others, but
? each state can be reached from all others by some path, and
? no two states have a periodic cycle between them.
When the second and third condition above hold, we say that the Markov chain is ergodic. We are going to
mix an application of probability and eigendecomposition to analyse the steady state behaviour of an ergodic
Markov chain.
Let the state-transition probabilities for the diagram above be given by a matrix:
P =
??0.5 0.5 00.2 0.5 0.3
0 0.5 0.5
??
The entries in P are read as the probability that, starting in an initial state corresponding to a given row,
the system moves to the state indicated by the column. Importantly, note that all rows sum to 1; this is
the definition of a stochastic matrix. Indeed, you can think of each row of a stochastic matrix as a valid
probability distribution, specifically a categorical distribution. (In the lectures, categorical distributions over
k possible outcomes were mentioned as the basic probability model underpinning multinomial distributions,
in the same way that a Bernoulli distribution over binary outcomes underpin the binomial distribution. What
we see here is a much more general model that includes a notion of state.)
Exercise 2.
a) Write a script encoding P as a MATLAB matrix, and use a logical test to check that it indeed is a
stochastic matrix.
b) An initial state x0 can be encoded as vector, with a value of 1 indicating the initial state and zeros
everywhere else. Write a vector x encoding an initial state of 1.
c) Compute the distribution of states of the Markov chain 2 time steps after starting in state 2 by multi-
plying again by P . Repeat the multiplication for 4, 10, 20 and 30 time steps. What is happening to
the resulting distribution of states?
Page 3
Tutorial 4 ELEC2103/ELEC9103
5.2.1 Stationary distribution of a Markov chain
A stationary distribution of a Markov chain is a probability distribution that remains unchanged in the Markov
chain as time progresses. Typically, it is represented as a row vector whose entries are probabilities summing
to 1, and given transition matrix P, it satisfies:
xP = x
In other words, x is invariant by the matrix P .
An important result in matrix theory known as the Perron–Frobenius theorem applies to stochastic matrices.
It concludes that a nonzero solution of the equation above exists and is unique to within a scaling factor. If
this scaling factor is chosen so that: ∑
i
xi = 1,
then x is a state vector of the Markov chain, and is itself a stationary distribution over states.
In the exercise above, we saw an iterative approach to computing the stationary distribution of a Markov
chain. This process is called the power method, for a reason that should be obvious.
However, note that the equation for the stationary distribution looks very similar to the column vector
equation Pq = λq for eigenvalues and eigenvectors, with λ = 1. Also you can test that P is positive-definite,
so P has an eigendecomposition — this is a general characteristic of stochastic matrices. In fact, for a
technical reason, we transpose the matrices to get them into an appropriate form for eigenanalysis. This
allows us to find the stationary distribution as a left eigenvector (as opposed to the usual right eigenvectors)
of the transition matrix. The operations are as follows:
(xP )T = xT = PTxT
In other words, the transposed transition matrix PT has eigenvectors with eigenvalue 1 that, when normalized
to sum to 1, are stationary distributions expressed as column vectors. Therefore, if the eigenvectors of PT are
known, then so are the stationary distributions of the Markov chain with transition matrix P. In an ergodic
Markov chain, this stationary distribution is unique.
Exercise 3.
a) Write a script to: (i) compute the left eigenvectors of P , (ii) check that there is at least one eigenvalue
of 1, and (iii) display the associated stationary distribution of the Markov chain P .
b) Compare the answers you got in a) to those from Exercise 2 c).
When there are multiple eigenvectors associated to an eigenvalue of 1, each such eigenvector gives rise to
an associated stationary distribution. However, this can only occur when the Markov chain is reducible (i.e.
can be broken into smaller independent chains).
5.2.2 PageRank algorithm (Google)