ADVANCED SIMULATION METHODS
COURSEWORK 1, SPRING 2024-2025
This assignment will be done individually. You should prepare a PDF report written primarily in a in a report/paper format, to be submitted through Blackboard, Coursework 1 container (under Courseworks folder). You can also write this in IPython notebook format, if you like, however, you should export a readable PDF from this. By submitting the coursework,you will be agreeing that this assessed coursework is your own work unless otherwise acknowledged, includes no plagiarism, you have not discussed it with anyone else except when seeking clarifications with the module lecturer, and have not shared any code with anyone else prior to submission. In your answers, you can quote material from the notes and shared IPython notebooks, providing your own original explanations.
This course work accounts for approximately 40% of the final grade. The report will be marked from 0 to 40. The deadline is 1PM, 12 March 2025. To enable anonymous marking, please put your CID and not your name into the filename of the submission or into the manuscript.
Q1: ESTIMATING A LATENT VARIABLE MODEL
Assume that we have the following latent variable model:
where
Below, we set our observed data to
The parameter ∈ is a free parameter which we will use to investigate sampling behaviour. In this question, we aim at sampling from the posterior p(θ|y) using various methods you have seen in Chapters 1 and 2.
1. Provide the expressions of the distributions p(y|θ) and p(θ|y) in closed form using the Gaussian marginalization and conditioning formulas in Chapter 2. You can refer to the lemmas in the lecture notes, but do give a brief explanation of the derivation. (2 marks)
2. Given the parameters in the main part of the question and your derivation in Part 1, compute (on computer) the mean and covariance of p(θ|y) and provide a contour plot (see this Jupyter notebookfor helper code for plotting) for ∈ = 10−3 , 10−2 , 10−1 , 1. Discuss these plots and the effect of ∈ . (1 mark)
3. Fix ∈ = 1. Present and demonstrate the performance of the random-walk Metropolis- Hastings (RWMH) sampler to sample from p(θ|y). Provide a clear explanation of the method, the parameter choice (e.g. proposal variance, denoted σM(2)H ). Draw N = 100000 samples after an appropriate burnin stage (you need to set this) to demonstrate the method. (2 marks)
4. Provide the plots of the autocorrelation functions w.r.t. different proposal variance choices by varying σM(2)H between 10−4 to 4 with an appropriate precision and running RWMH for these variances. Identify the best variance for mixing. Plot your samples for the best choice of your proposal on top of the density contour. (3 marks)
5. Fix the variance σM(2)H of the proposal to the best value you have obtained in the previous part, now try this sampler for ∈ = 10−3 , 10−2 , 10−1 , 1 for N = 100000 samples. Provide the curves of the autocorrelation function for each case in a single plot. Comment on the results, in particular the effect ∈ on mixing, provide a reasoning to explain it. (2 marks)
6. Now suppose that p(y|θ) cannot be computed in closed form. Derive a basic impor- tance sampling (IS) estimator for this quantity. (1 mark)
7. Fix ∈ = 1 for the rest of this question. Let us denote the IS estimator you derived in previous part with We denote the relative absolute error (RAE) as
where σIS is the standard deviation of the proposal distribution q = N(0, σI(2)SI) and
K is the number of samples used to build this IS estimator. To test this estimator, we will set θ = [0, 0] and use y as above. Compute the ground truth value p(y|θ), tune σI(2)S , and demonstrate the estimator accuracy by computing RAE for K =102 , 103 , 104 , 105 . To demonstrate this, you may need to run, e.g., M = 10 Monte Carlo simulations for each K and plot average relative absolute errors (over Monte Carlo simulations) w.r.t. K. Compare the resulting curve with a curve that decays like 1/√K visually. You do not need to get good relative errors here, but attempt at demonstrating the rate. Due to randomness, this curve may not exactly decay like 1/√K, but it should be close enough. (4 marks)
8. Plug this IS estimator into the MH ratio to obtain a pseudo-marginal Metropolis Hastings (PMMH) method as shown in the lecture notes. Describe the method and the algorithm in detail. (2 marks)
9. For numerical results, you may have to tune σI(2)S and σM(2)H for the best performance (do not assume the values you optimized above will work). Set N = 10000 (set an appropriate burnin stage). Run the PMMH sampler for K = 101 , 102 , 103 , provide autocorrelation functions for each case. Comment on the effect of K on mixing. Visualise the samples drawn for K = 103 case. How do these samples compare to the samples in Part 2? Elaborate. (5 marks)
Q2: IMPORTANCE SAMPLING FOR A HIDDEN MARKOV MODEL
Assume that we would like to understand the performance of a self-normalised importance sampler (SNIS) for the posterior sampling task p(x0:n|y1:n) in a state-space model.
Given a linear-Gaussian hidden Markov model:
for n = 1, . . . , T. For fixed T, we assume the following scalar parameterization:
m0 = 0, P0 = 1, A = 0.9, Q = 0.01, H = 1, R = 0.1.
1. Describe a procedure to simulate data from this hidden Markov model above and implement it. To demonstrate the implementation, simulate different lenghts of data T = 2, . . . , 20 to demonstrate your function, plot x0:T vs. y1:T (observations, scattered) for each T. Note that each dataset is generated independently, i.e. the dataset with T = 2 is completely different from T = 3. When plotting, pay attention to indexing as states are indexed from 0 and observations are from 1. Store these datasets and do not modify them as they will be used below. (2 marks)
2. Describe a Kalman-based strategy to estimate the mean of the posterior distribution p(x0:T |y1:T), i.e.,
for each T = 2, . . . , 20 using the datasets generated in Part 1. We denote this mean m0(T:T) for T = 2, . . . , 20 (we index the estimator with T to stress the fact that the datasets used for the estimators are different). (3 marks)
3. Implement this method and obtain the estimate m0(T:T) for T = 2, . . . , 20. Plot them vs. true states you have generated in Part 1 (this would be a separate plot for each T = 2, . . . , 20). Store these estimates m0(T:T) , as they will be used below. (3 marks)
4. Derive a self-normalised importance sampling (SNIS) estimator for the quantity m0:T, i.e., the mean of the posterior p(x0:T |y1:T). Use a proposal of the form.
where q(xk ) = N(0, σI(2)S ). Derive the SNIS estimator for the mean, denote it m(¯)0(T:T) . (2 marks)
5. Run the estimator for K = 10000 particles for the dataset for T = 2 and record the SNIS estimate m(¯)0(2:2) . Compare this to the estimate visually to the estimate you obtained in Part 2 for T = 2 namely m0(2:2) . You will have to tune σI(2)S for a good performance. (3 marks)
6. Run the SNIS estimator for T = 2, . . . , 20. Choose an appropriate σI(2)S for a good performance (a good candidate is the one in previous part) and take K = 10000.
Provide two plots: (i) The effective sample size (ESS) value vs. T = 2, . . . , 20 and (ii) the normalised mean squared error:
vs. T = 2, . . . , 20. Comment on the results. What trend do you observe as T grows? What conclusions can you draw from this for estimating expectations with growing state-space dimension? (5 marks)