# ST 302程序辅导、辅导C/C++，Python编程

ST 302 Stochastic Processes
1 Information and Conditioning
1.1 Sigma-Algebras and Filtrations
Let us recall that a -algebra is a collection F of subsets of the sample space

such that
(i) ;,
2 F
(ii) if A 2 F then also Ac 2 F (here Ac denotes the complementary set to A)
(iii) if we have a sequence A1, A2, ... of sets in F , then [1i=1Ai 2 F .
We say that a collection of sets generates a -algebra F if F is the smallest
-algebra which contains all the sets. The trivial -algebra is the one containing
only ; and
.
Interpretation: We perform a random experiment. The realized outcome is an
element ! of the sample space
. Assume that we are given some partial infor-
mation in form of a -algebra F . If F does not contain all subsets of
, we might
not know the precise !, but we may narrow down the possibilities. We know with
certainty whether a set A 2 F either contains ! or does not contain it ?these sets
are resolved by our given information.
Example 1.1 We toss a coin three times, with the result of each toss being H
(head) or T (tail). Our time set consists of t = 0; 1; 2; 3. The sample space
is
the set of all eight possible outcomes. At time 0 (just before the ?rst coin toss),
1
we only know that the true ! does not belong to ; and does belong to
, hence
we set
F0 = f;;
g :
At time 1, after the ?rst coin toss, in addition the following two sets are resolved:
AH = fHHH;HHT;HTH;HTTg
AT = fTHH; THT; TTH; TTTg :
We can say with certainty whether ! belongs to AH or AT . In contrast, the
information about the ?rst coin toss only is not enough to determine with certainty
whether ! is contained e.g. in fHHH;HHTg ?for this we would have to wait
until the second coin toss. As the complement of AH is AT , and the union of AH
and AT equals
, we set
F1 = f;;
; AH ; ATg :
At time 2, after the second coin toss, in addition to the sets already contained in
F1, the sets
AHH = fHHH;HHTg ; AHT = fHTH;HTTg ;
ATH = fTHH; THTg ; ATT = fTTH; TTTg
get resolved, together with their complements and unions. Altogether, we get
F2 =
;;
; AH ; AT ; AHH ; AHT ; ATH ; ATT ; AcHH ; AcHT ; AcTH ; AcTT ;
AHH [ ATH ; AHH [ ATT ; AHT [ ATH ; AHT [ ATT

:
At time 3, after the third coin toss, we know the true realization of !, and can
therefore tell for each subset of
whether ! is a member or not. Hence
F3 = The set of all subsets of
:
As we have F0 F1 F2 F3 these four -algebras are said to form a ?ltration.
The random variable
X = number of tails among the rst two coin tosses
is said to be F2-measurable, but not F1-measurable (as we cannot determine the
value of X after only one coin toss).
2
Consider now a discrete time stochastic process X = (X0; X1; X2; :::), i.e. a
collection of random variables indexed by time. Hence X is a function of a chance
parameter ! and a time parameter n. We would write Xn(!) for a function
value, but typically suppress the chance parameter, otherwise the notation gets
too heavy. Moreover, the two variables play quite di¤erent roles, since the chance
parameter ! comes from the sample space
(which can be a very large set, with
no natural ordering) whereas the time parameter n is an element of the ordered
set N+.
We denote with Fn a family of sets containing all information about X up
to time n. In more detail, Fn is the -algebra generated by sets of the form (we
assume that the starting value X0 is just a deterministic number)
fX1 = i1; X2 = i2; :::; Xn = ing
if the state space is discrete. If the state space is R then Fn is the -algebra
generated by sets of the form
fX1 2 (a1; b1) ; X2 2 (a2; b2) ; :::; Xn 2 (an;bn)g
with intervals (a1; b1), (a2; b2), ... We have
F0 = f;;
g (at time zero, we know nothing)
F0 F1 F2 ::: Fn (the more time evolves, the more we know)
(Fn) = (F0;F1;F2; :::) is called a ltration. Sometimes we write
FXn to specify
that we are collecting information about the stochastic process X.
We say a random variable H is Fn-measurable if it depends on information
about X up to time n only. For example, if f is some function, then f (Xn) is
Fn-measurable; and so is max1inXi. Or, in other words, H is Fn-measurable if
it is a function of X1,...,Xn.
We say that a stochastic process Y = (Yn)n0 is adapted to (Fn) if each Yn is
Fn-measurable.
FYn is called the ltration generated by Y if it is the smallest
ltration where Y is adapted to. Summing up,
The random variable H is measurable with respect to a -algebra F if the
information in F is su￠ cient to determine H, and F can even contain more
The stochastic process X is adapted to a ?ltration (Fn) if the information in
(Fn) is su￠ cient to determine all the Xns, and (Fn) can even contain more
info.
FXn contains just the information about all the Xn?s. More precisely,
the -algebra FXn contains exactly all the information about X0, X1, ..., Xn.
3
1.2 Conditional Expectation
We consider a -algebra F and want to make a prediction about some random
variable H based on the information contained in F . If H is F-measurable, then
we can precisely determineH from the given information in F . IfH is independent
of F , then the information in F is of no help in predicting the outcome of H. In
the intermediary case, we can try to use the information in F to make an educated
guess?about H, without being able to completely evaluate H.
You cannot avoid this section. Its material is absolutely essential for every-
thing that follows. To quote Thomas Mikosch, "The notion of conditional ex-
pectation is one of the most di￠ cult ones in probability theory, but it is also one
of the most powerful tools... For its complete theoretical understanding, measure-
theoretic probability theory is unavoidable." However, it is perfectly possible, and
not even di￠ cult, to get an operational understanding of conditional expectation.
You have just to grasp the following intuitive properties, they will be given for the
forementioned reason without proof.
De?nition 1.2 If E [H2] < 1, then the conditional expectation of H given F ,
written E [H j F ], is the least-squares-best F-measurable predictor of H: amongst
all F-measurable random variables bH (i.e. amongst all predictors which can be
computed from available information), it minimizes
E
bH H2 :
It turns out that one can de?ne (by the partial averaging property below) the
conditional expectation also in the more general case where we only assume that
E [jHj] < 1 (so E [H2] < 1 need not be satis?ed). We give a list of properties
of E [H j F ]. Here A denotes the indicator function of the set A: A (!) = 1 if
! 2 A, A (!) = 0 if ! =2 A.
(a) (Measurability) E [H j F ] is F-measurable
(the conditional expectation is a predictor based on available information)
(b) (Partial Averaging) For every set A 2 F ,
E [AE [H j F ]] = E [AH]
(on every set A 2 F , the conditional expectation of H given F has the same
expectation asH itself). In particular (takeA =
),
E [E [H j F ]] = E [H]
4
Properties (a) and (b) characterize the random variable E [H j F ] uniquely (mod-
ulo null sets).
(c) (Linearity) E [a1H1 + a2H2j F ] = a1E [H 1j F ] + a2E [H2 j F ]
(d) (Positivity) If H 0, then E [H j F ] 0
(e) (Jensen?s Inequality) If f is a convex function such that
E [jf(H)j] <1, then
f (E [H j F ]) E [f (H) j F ]
(f) (Tower Property) If G is a sub -algebra of F (contains less information
than F) then
E [E [H j F ] j G] = E [H j G] :
In case we deal with a ?ltration, this property has the following form: for two
time points s t (hence Fs Ft) we have
E [E [H j Ft] j Fs] = E [H j Fs] :
(nested conditional expectations are evaluated by taking only one conditional
expectation with respect to the earliest time point)
(g) (Taking out what is known) If G is a random variable which is F-
measurable, and such that E [jGHj] <1, then
E [GH j F ] = GE [H j F ]
In particular (choose H = 1),
E [G j F ] = G
(if G does only depend on available information then we can predict G fully)
(h) (R?le of independence) If H is independent of F , then
E [H j F ] = E [H]
(in that case, the information in F is useless in predicting H)
Let X be any random variable. We will sometimes use the notation E [H jX] if
we predict H by using the information about X only (more formally, we condition
with respect to the -algebra generated by X). Moreover, if A
we write
P (Aj F) for E [Aj F ]. An important special case of (h) is if we consider the
trivial -algebra F0 = f;;
g. In that case, the conditional expectation with
respect to F0 is just the common expectation (so the result is a number):
E [H j F0] = E [H] :
5
2 Martingales in Discrete Time
2.1 De?nition
Let X = (Xn)n0 be a discrete time process with E [jXnj] < 1 for all n, and let
(Fn) be some ?ltration. We assume that X is adapted to (Fn), that is, each Xn
is Fn-measurable.
De?nition 2.1 X is called a martingale (relative to (Fn), P ) if for all n 1
E [Xnj Fn1] = Xn1;
a submartingale if
E [Xnj Fn1] Xn1;
and a supermartingale if
E [Xnj Fn1] Xn1:
Interpretation: let Xn be the wealth at time n of a player engaged in some casino
game. The martingale property, rewritten as
E [ (Xn Xn1)j Fn1] = 0
(here we have used properties (c): Linearity and (g): Taking out what is known
of the conditional expectation), tells us that, based on all the information about
the games progress so far, one can predict that our increase in wealth will be on
average zero: we are playing a fair game. In reality, our wealth will be described in
most games by a supermartingale: there is a tendency to loose on average. There
are some games like Black Jack, however, where you can make a very clever use of
past events (like memorizing the cards which have already been taken out of the
deck) to achieve that your wealth process evolves like a submartingale: you are
engaged in a favourable game.
We will later on introduce the notion of Markov processes: these are stochastic
processes where future predictions depend on the past only via the present state,
they are memoryless. A martingale need not be a Markov process (because future
outcomes may depend on the whole past) and a Markov process need not be a
martingale (since it may be related to an unfair game).
Exercise 2.2 Prove the following statements by applying the properties of
conditional expectation. State in each step which property you use. All random
variables are assumed to be su￠ ciently integrable.
6
Fix some time horizon T 2 N0, and let H be a FT -measurable random
variable. De?ne a stochastic process X for n = 0; 1; 2; :::; T as
Xn = E [Hj Fn] :
Show that X is a martingale: check that X is adapted and that it ful?lls
the martingale property.
Let f be a convex function and X a martingale. Show that f (X) is a
submartingale and f(X) a supermartingale.
A stochastic process X is called predictable if for each n we have that Xn
is Fn1-measurable. Show that every predictable process is (Fn)-adapted,
and that every predictable martingale is constant.
Let Y be a predictable process, and X be a martingale. Show that then the
martingale transform Y X, de?ned as ((Y X)0 = 0 and)
(Y X)n =
nX
i=1
Yi (Xi Xi1)
is a martingale as well.
2.2 Discrete-Time Examples
Example 2.3 Sums of independent zero-mean RVs (random variables). Let
X1, X2,... be a sequence of independent RVs with E [jXkj] <1, 8k, and
E [Xk] = 0; 8k:
Dene (S0 := 0 and)
Sn := X1 +X2 + :::+Xn:
Let (Fn) =
FXn , the ?ltration which is generated by X. Note that Sn is Fn-
measurable (depends only on (X1; X2; :::; Xn)). Then we have for n 1
E [Snj Fn1] = E [Sn1j Fn1] + E [Xnj Fn1] by linearity (c)
= Sn1 + E [Xnj Fn1] taking out what is known (g)
= Sn1 + E [Xn] by independence (h)
= Sn1:
This proves that S = (Sn) is a martingale.
7
Example 2.4 Products of non-negative independent RVs of mean 1. Let
Y1, Y2,... be a sequence of independent non-negative RVs with
E [Yn] = 1; 8n:
De?ne (M0 := 1 and)
Mn := Y1Y2:::Yn:
Let (Fn) =
FYn , the ?ltration which is generated by Y . Then, for n 1, we
have
E [Mnj Fn1] = E [Mn1Ynj Fn1]
= Mn1E [Ynj Fn1] taking out what is known (g)
= Mn1E [Yn] by independence (h)
= Mn1:
This proves that M is a martingale.
Example 2.5 Exponential martingale. Let X1, X2,... be a sequence of inde-
pendent and identically distributed RVs with moment-generating function
' () = E [exp (Xi)] <1:
We set (S0 := 0 and) Sn = X1 + ::: +Xn. Let (Fn) =
FXn , the ?ltration which
is generated by X.
Then
Mn =
exp(Sn)
' ()n
is a martingale. This follows from the previous example, just put
Yk =
exp (Xk)
' ()
:
Example 2.6 Exponential martingale for random walk. In the setting of
the previous example, denote the distribution of the Xns as X where
P (X = +1) = P (X = 1) = 1
2
:
The stochastic process S = (Sn)n0 is called a symmetric random walk. Then
' () = E [exp (X)] =