MAST30027讲解、辅导R编程设计

MAST30027: Modern Applied Statistics
Assignment 4, 2022.
Due: 11:59pm Sunday October 23rd
This assignment is worth 17% of your total mark.
To get full marks, show your working including 1) R commands and outputs you use, 2)
mathematics derivation, and 3) rigorous explanation why you reach conclusions or answers.
If you just provide final answers, you will get zero mark.
The assignment you hand in must be typed (except for math formulas), and be submitted
using LMS as a single PDF document only (no other formats allowed). For math formulas,
you can take a picture of them. Your answers must be clearly numbered and in the same
order as the assignment questions.
The LMS will not accept late submissions. It is your responsibility to ensure that your
assignments are submitted correctly and on time, and problems with online submissions are
not a valid excuse for submitting a late or incorrect version of an assignment.
We will mark a selected set of problems. We will select problems worth ≥ 50% of the full
marks listed.
If you need an extension, please contact the tutor coordinator before the due date with
appropriate justification and supporting documents. Late assignments will only be accepted
if you have obtained an extension from the tutor coordinator before the due date. Under
no circumstances an assignment will be marked if solutions for it have been released. Please
DO NOT email the lecturer for extension request.
Also, please read the “Assessments” section in “Subject Overview” page of the LMS.
Data: The file assignment4.txt contains 100 observations simulated from a normal distribution
with mean = 5 and standard deviation = 2 by using the following code.
> set.seed(30027)
> x = rnorm(100, 5, 2)
You can read the data as follows.
> X = scan(file="assignment4.txt", what=double())
Read 100 items
> mean(X)
[1] 5.089332
> sqrt(var(X))
[1] 1.998487
Model: we consider a normal model:
xi ～ N(μ, σ2) for i = 1, . . . , 100.
Prior: we impose the following prior for mean and variance parameters:
p(μ, σ2) ∝ 1
σ2
1
Problem 1: Posterior inference using Gibbs sampling
(a) (10 marks) Derive the following conditional distributions.
p(μ|σ2, x1, . . . , x100) and p(σ2|μ, x1, . . . , x100).
If they are known distributions, write distribution names and their parameters.
[
For exam-
ple, gamma distribution with shape =
∑
i xi and scale =
∑
i x
2
i
]
.
(b) (5 marks)Write a code that uses the Gibbs sampling to simulate samples from p(μ, σ2|x1, . . . , x100).
Run at least two Gibbs sampling chains with different initial values. Please run with at least
500 iterations. Make a trace plot for each of parameters and see if samples from different
chains are mixed well and behave similarly.
(c) (7 marks) Using the simulated sample from one chain, for each parameter 1) make a
plot that shows empirical (estimated) marginal posterior distribution, 2) estimate marginal
posterior mean, and 3) report a 90% credible interval for the marginal posterior distribution.
You can find a 90% credible interval in a number of ways. For this assignment, use 5% in
each tail. Discard early iterations as a burn-in. You can decide burn-in period from the
trace plots in (b).
Problem 2: Posterior inference using Metropolis-Hastings (MH) algorithm
(a) (15 marks)Write a code that uses the MH algorithm to simulate samples from p(μ, σ2|x1, . . . , x100).
For the current values of parameters (μc, σ
2
c ), we propose new values (μn, σ
2
n) as follows. σ
2
n ～
gamma(shape = 5σ2c , rate = 5) and μn ～ Normal(mean = μc, variance = σ2n). Run at least
two MH chains with different initial values. Please run with at least 10000 iterations. Make
a trace plot for each of parameters and see if samples from different chains are mixed well
and behave similarly.
(b) (7 marks) Repeat (c) in the Problem 1.
The application of Variational Inference (VI) to the current model and prior is not straightforward
because the ELBO is not well defined with the improper prior (that does not integrate to a finite
quantity). Thus, we will consider another prior and apply VI for the posterior inference.
Model: we consider the same normal model:
xi ～ N(μ, σ2) for i = 1, . . . , 100.
Prior: we impose the following prior for mean and variance parameters:
p(μ, σ2) = p(μ|σ2)p(σ2),
μ|σ2 ～ N(μ0, σ2), σ2 ～ Inverse-Gamma(a0, b0),
where the Inverse-Gamma(a0, b0) has the pdf
f(x) =
ba00
Γ(a0)
x?a0?1 exp
(
?b0
x
)
.
If necessary, you can use the following two properties of Inverse-Gamma(a0, b0) without providing
derivation. When X ～ Inverse-Gamma(a0, b0),
E[
1
X
] =
a0
b0
, E[logX] = log b0 ?Ψ(a0), where Ψ(a0) = d
da0
log Γ(a0).
2
Problem 3: Posterior inference using Variational Inference (VI)
We will apply VI with the mean-field variational family where q(μ, σ2) = qμ(μ)qσ2(σ
2) and use
the CAVI algorithm for optimisation. The CAVI iteratively optimises each factor as follows while
holding the other factors fixed:
q?μ(μ) ∝ exp{Eσ2 [log p(μ, σ2, x1, . . . , x100)]},
q?σ2(σ
2) ∝ exp{Eμ[log p(μ, σ2, x1, . . . , x100)]},
where the expectations Eσ2 and Eμ are taken with respect to q?σ2(σ
2) and q?μ(μ), respectively.
(a) (10 marks) Derive q?μ(μ) and q
?
σ2(σ
2) and write the corresponding distribution names and
their parameters.
[
For example, for the model and prior we considered in the lecture - see
the page 20 of the Variational Inference slides, we derived and presented that q?μ(μ) is the pdf
of N(μ?, σ2?) and q?τ (τ) is the pdf of Gamma(a
?, b?), where μ? = λ0μ0+nxˉλ0+n , σ
2? = 1(λ0+n)Eτ [τ ] ,
Eτ [τ ] = a
?
b? , a
? = n+12 + a0, b
? = b0 +
∑
i Eμ[(xi?μ)2]
2 +
λ0Eμ[(μ?μ0)2]
2 , Eμ[(xi ? μ)2] = σ2? +
(μ?)2 ? 2xiμ? + x2i , and Eμ[(μ? μ0)2] = σ2? + (μ?)2 ? 2μ0μ? + μ20.
]
(b) (10 marks) Derive the ELBO up to constant.
(c) (10 marks) Implement the CAVI algorithm and obtain q?μ(μ) and q

σ2(σ
2) which minimise
the KL divergence by applying the implemented algorithm to x1, . . . , xn for μ0 = 0, a0 =
2, b0 = 2. Set the CAVI algorithms to stop when either the number of iterations reaches
100 (max.iter = 100) or the ELBO has changed by less than 0.00001 ( = 0.00001). Run
the CAVI algorithm at least two times using different initial values and report q?μ(μ) and
q?σ2(σ
2) with the highest ELBO. Provide the initial values which lead the reported q?μ(μ) and
q?σ2(σ
2) and we will probably run our own implementation using the initial values and see if
we can reproduce your answer when marking. For each run, check that the ELBO increase
at each iteration by plotting them.