ST3370_B
BAYESIAN FORECASTING AND INTERVENTION
Summer 2023
Question 1
We are given three observations about the temperature in January in London, namely y1 = 1; y2 = −2; y3 = 5 degrees Celsius. We know that the historic average January temperature is 0 degrees and we want to make inferences on its variance by modelling the observations as realizations of y1:n = (y1, . . . , yn) which, given a random parameter θ, are conditionally independent and identically distributed random variables. We recall the expressions of the following three densities:
• Normal distribution of parameters (µ ∈ R, σ2 > 0)
• Generalized Student’s T distribution of parameters (ν > 1, ˆµ ∈ R, ˆσ > 0)
• Gamma distribution of parameters (α > 0, β > 0)
(a) Consider the following likelihoods for the observations:
1. fyi|θ(y | θ) = N(y; θ, 1);
2. fyi|θ(y | θ) = N(y; 1, θ);
3. fyi|θ(y | θ) = N(y; 0, θ);
4. fyi|θ(y | θ) = St(y; ν, 0, θ) with ν > 1;
5. fyi|θ(y | θ) = Ga(y; θ, β0) with β0 > 0.
For each of them, argue whether it represents a good choice for the likelihood, considering both the information at your disposal and the tractability of the model. [Hint: the mean of St(y; ν, ˆµ, ˆσ) is µˆ if ν > 1] [5 marks]
(b) Consider the following likelihood and prior for the precision parameter:
Recall that if x ∼ Ga(x; α, β), the expected value of x is E(x) = α/β and its variance is Var(x) = α/β2
.
(i) What is the prior opinion on the average precision of the data? [1 mark]
(ii) Derive the posterior pθ|Y1:3 justifying every step. [3 marks]
(iii) Interpret the information contained in the posterior about the average precision of the temperature by comparing it with the prior guess. [2 marks]
(c) Consider the same likelihood fyi|θ(y | θ) and prior for θ as in Question 1(b).
(i) Derive the distribution of y1 justifying every step. [3 marks]
(ii) Using the properties of the conditional moments, find the mean and variance of y1 using the fact that E(1/θ) = ∞. [2 marks]
(iii) Use the findings of Question 1(c)(i) and Question 1(c)(ii) to argue to which extent the distribution of a conditionally normal observation is similar to a normal distribution. [2 marks]
(iv) Another set of readings of the temperature in the same experimental conditions provides you with observations z1 = 0.9; z2 = −2.1; z3 = 5.2 degrees Celsius. You decide to implement a new model whose observables are the mean of the readings xi = (yi + zi)/2. Argue whether you should use the same parameters for the likelihood and the prior as the ones in Question 1(b). [2 marks]
Question 2
Consider a general state-space model (θk, yk)k≥0, where (θk)k≥0 are the parameters and (yk)k≥0 are the observables.
(a) (i) State the assumptions on (θk)k≥0 and (yk)k≥0. [2 marks]
(ii) Express the filtering distribution pθn|y0:n
in terms of the likelihood pyn|θn and the predictive pθn|y0:n−1
justifying all steps. [3 marks]
(b) Assume that the state-space model (θk, yk)k≥0 is a d-dimensional dynamic linear model (DLM) of the form.
with θ0 = u0 and, for k ≥ 0, vk and uk have mean zero and are all independent.
(i) Provide the expression of the predictive mean E(θn | y0:n−1) in terms of the filtering mean E(θn−1 | y0:n−1) justifying all steps. [2 marks]
(ii) Provide the expression of the predictive variance Var(θn | y0:n−1) in terms of the filtering variance Var(θn−1 | y0:n−1) justifying all steps. [2 marks]
(iii) Can the predictive distribution pθn|y0:n−1 have the same mean and variance as the filtering distribution pθn|y0:n−1 ? Justify your answer using your solution to Question 2(b)(i) and Question 2(b)(ii). [2 marks]
(c) Assume we are given the observations y0:n and we are interested in retrospectively reconstructing the system, i.e., in finding the smoothing distribution pθk|y0:n
(θk | y0:n) for 0 ≤ k ≤ n. In the following, we assume the Kalman filter recursion is known and we aim at establishing a backward smoothing recursion.
(i) Using Bayes theorem and the properties of a state-space model, show that
(2.1)
[Hint: explain why pθk|θk+1,y0:n
(θk | θk+1, y0:n) = pθk|θk+1,y0:k
(θk | θk+1, y0:k).] [4 marks]
(ii) For each of the probabilities appearing in (2.1), explain why they are known. [2 marks]
(iii) Use (2.1) to express pθk|y0:n
(θk | y0:n) in terms of an integral depending on pθk+1|y0:n
(θk+1 | y0:n) and other known probabilities. [3 marks]
Question 3
Consider M a univariate time-series (θk, yk)k≥0 of the form.
with θ0 = u0 and, for k ≥ 0, vk and uk have mean zero and are all independent.
(a) (i) Derive the expression of the forecast function gk(δ) = E(yk+δ | y0:k) of M, for δ > 0, in terms of the filtering mean mˆ k = E(θk | y0:k) justifying all steps. [3 marks]
(ii) Provide the definition of the observability matrix and explain in your own words why it is important for it to be invertible. [2 marks]
(iii) Consider the transition matrix
the observation vector H = (1, 1) and the filtering mean ˆmk = (3, 3)t
. Find the forecast function and compare it to the one of a polynomial model. [2 marks]
(iv) Can a DLM similar to the one in Question 3(a)(iii) have forecast function gk(δ) = 3δ? Justify your answer. [2 marks]
(v) Why do you think we have classified DLMs based on E(yk+δ | y0:k) and not on E(θk+δ | y0:k)? [1 mark]
(b) Consider M with transition matrix
and with observation matrix H = (2, 1).
(i) Provide the expression of M′ a canonical similar model to M and identify its transition matrix F
′ and its observation matrix H′
. Justify your answers. [2 marks]
(ii) Find the similarity matrix S between M′ and M. [4 marks]
(iii) What would change in your answer to Question 3 (b)(ii) if H = (0, 1)? [2 marks]
(iv) Let Id2 denote the 2 × 2 identity matrix. Since F = Id2F
′
Id−
2
1
, can you conclude that the similarity matrix is S = Id2, without performing the calculations in Question 3 (b)(ii)? Justify your answer. [2 marks]
Question 4
Consider (yk, θk)k≥0 to be a dynamic linear model of the form.
with θ0 = u0 and, for k ≥ 0, uk = (ϵk, ϵ′k, 0), where (ϵk)k,(ϵ′k)k are all uncorrelated, have mean zero and constant variance σ
2
. Define ˜yk = (1 − B)
2yk where B is the backshift operator.
(a) (i) Let θk = (θk,1, θk,2, θk,3). Show that y˜k = θk−2,3 + ϵk − ϵk−1 + ϵ
′
k−1
justifying all steps. [2 marks]
(ii) Show that m = E(y˜k) does not depend on k justifying all steps. [2 marks]
(iii) Show that v = Var(y˜k) does not depend on k justifying all steps. [2 marks]
(iv) Show that for δ > 0, cδ = Cov(y˜k, y˜k−δ) does not depend on k, justifying all steps. [3 marks]
(v) Use your findings in Question 4 (a) (i)–(iii) to describe the type of data that could be modelled with an ARIMA model of polynomial order 2. [3 marks]
(b) Assume you do not know the distribution of (yk, θk)k≥0 but you know that (˜yk)k is a stationary process and you decide to model it with an AR(2) time series model
where ϵk is a sequence of mutually-independent random variables such that ϵk ∼ N (0, σ2
) for any time step k.
(i) Find the polynomial ϕ(x) such that ϵk = ϕ(B)˜yk and express it in the form.
where 1/α1, 1/α2 are the roots of ϕ, i.e., ϕ(1/α1) = ϕ(1/α2) = 0. [2 marks]
(ii) Use Question 4(b)(i) to find real coefficients β1, β2 such that
[3 marks]
(iii) Use Question 4(b)(ii) to find the coefficients (ϕδ)δ of Wold’s representation of ˜yk, i.e., such that
[3 marks]
Question 5: Compulsory question for students taking ST405.
(a) The goal is to approximate the mean of a real valued distribution π, that is,
(i) Provide the expression for ˆI1 the Monte Carlo estimator of I and express its variance in terms of the variance σ
2 of π. [2 marks]
(ii) Provide the expression for ˆI2 an importance sampling estimator of I with respect to a general proposal distribution s. [1 marks]
(iii) Can the variance of ˆI2 be smaller than the one of ˆI1? Justify your answer and provide an example [Hint: consider sampling weights ω(θ) = I/θ]. [3 marks]
(iv) Provide two reasons why ˆI2 could be preferable to ˆI1, also with reference to your answer to Question 5(a)(iii). [2 marks]
(b) The goal is to approximate the smoothing distribution p(θ0:k | y0:k) of a state-space model (yk, θk)k≥0 defined as
with θ0 = u0 and, for k ≥ 0, vk and uk have a normal distribution with mean 0 and variance 1 and are all independent.
(i) Consider a sequential importance sampling (SIS) approximation of p(θ0:k | y0:k) with respect to a general proposal distribution sn(θ0:n). Provide the expression for the discrete distribution pˆ that approximates p(θ0:k | y0:k). [3 marks]
(ii) What is the advantage of using a proposal that satisfies the Markov property? Justify your answer. [2 marks]
(iii) Consider a SIS approximation whose proposal coincides with the prior for the states. It is said that there is a prior/data conflict whenever the prior gives high probability to parameters that correspond to low values of the likelihood. How does the prior/data conflict impact on the SIS approximation? Justify your answer. [3 marks]
(iv) With reference to Question 5(b)(iii), provide intuition as to whether you expect to see a prior/data conflict if Y0 = 100. Explain the potential advantage of using p(θk|θk−1, yk) as the transition kernel of the proposal instead of p(θk|θk−1). [2 marks]
(v) Explain what changes if instead of p(θk|θk−1, yk) one uses p(θk|θk−1, y0:k) as the transition kernel of the proposal. [2 marks]