# STAT 5511 Homework 4

Homework 4

STAT 5511 (Spring 2020)
Charles R. Doss
Assigned: Sunday, March 22
Due: Monday, March 30
The usual formatting rules:
• You may use knitr or Sweave in general to produce the code portions of the HW. However, the output from knitr/Sweave that you include should be
only what is necessary to answer the question, rather than just any automatic output that R produces. (You may thus need to avoid using default R
functions if they output too much unnecessary material.)
– For example: for output from regression, the main things we would want to see are the estimates for each coefficient (with appropriate labels
of course) together with the computed OLS/linear regression standard errors and p-values.
• Code snippets that directly answer the questions can be included in your main homework document; ideally these should be preceded by comments
or text at least explaining what question they are answering. Extra code can be placed in an appendix.
• All plots produced in R should have appropriate labels on the axes as well as titles. Any plot should have explanation of what is being plotted given
clearly in the accompanying text.
• Plots and figures should be appropriately sized, meaning they should not be too large, so that the page length is not too long. (The arguments
fig.height and fig.width to knitr chunks can achieve this.)
1. Find (on Canvas) the file “hw5dat.rsav” (which can be loaded into R using load(“hw5dat.rsav”)).
It contains a time series (“xx”). The series is a “demeaned” monthly revenue stream (in millions of
dollars) for a company. There are n = 96 observations.
The series has been “demeaned”; usually that would mean we subtract off X¯ from every data point,
but pretend for now we know the mean µ exactly so we have subtracted off µ from every data point,
so the new series is exactly (theoretically) mean 0. (But thus its sample mean is not precisely 0.)
We will consider possible ARMA models for the series Xt
. We assume that the corresponding white
noise is Gaussian (so Xt
is Gaussian).
We will consider first an AR(2) model. We assume we know the true model exactly: it is
Model 1: Xt = .1.34Xtt 1 1 .48Xtt 2 + Wt
, Wt
iid∼ N(0, σ2
).
(a) Compute forecasts backcasts using Model 1, up to 25 time steps in the future and into the past.
Write code to do the prediction by hand (i.e., not using the predict() function). Plot the data,
forecast, and 95% prediction intervals [assuming gaussianity] (all on one plot). (Note: you do
not need to do a multiplicity correction for the prediction intervals.)
(b) Give a constant (nonrandom) number that the 100-step-ahead forecast, X100
196 , will be approxi￾mately equal to.
(c) If you were to do the one-step-ahead prediction but based on no data, what would your prediction
be? (Based on no data, it is the same for predicting at time 97 or at any other time.) What
would the mean-squared prediction error (call it E) be? Compare P
96
97 to E.
(d) Now say that we know the true mean of the company’s revenue series is .3 (million dollars).
Provide
i. a plot of the company’s (not-demeaned) revenue series (let’s call it Yt),
ii. and the prediction equation for the series Yt
. (Based on Model 1).
(e) The series Yt
is a monthly revenue stream for a company. The company needs to decide, before
the current month is up (i.e., before seeing the one-month-ahead revenue, Y97), whether to make
an important investment in equipment which will cost 1.1 million dollars. If their revenue next
month cannot cover the cost (is less than the cost of the investment) they will go bankrupt (they
have exactly 0 cash on hand and cannot take out loans). Explain why they should or should
not make the investment.
1
(f) The second model we will consider (Model 2) is an MA(2) model. Estimate an MA(2) model on
the Xt (demeaned) series using the arima() function (there is an include.mean variable; set it to
false). Then plot (on one plot): the data, forecasts up to 25 months ahead, and 95% prediction
intervals [assuming gaussianity]. You may use the predict() function.
(g) Compare the forecasts for Model 1 and Model 2: specifically, discuss how quickly the two
forecasts revert to the long run average of the series. Provide an explanation of this. [Note: I
am not intending for you to discuss the fact that in one case the coefficients were estimated and
in the other they were not estimated.]
(h) Now we consider changing the frequency of observation. Imagine someone outside the company
observes the company revenue but only on a quarterly basis, by which I mean every 3 months
(thus they observe March’s revenue, then June’s revenue, ...). Let this series of de-meaned
observations be Zt
. If the true model for Xt
is AR(1), Xt = φ1Xtt1 +Wt where Wt
iid∼ N(0, σ2
),
then what is the (true) model for Zt (including the distribution of the white noise series)?
2. Shumway and Stoffer (4th ed.), question 3.15
3. Shumway and Stoffer (4th ed.), question 3.20
4. A data analyst is analyzing a time series with 1000 observations. She fits an ARMA(2,1) model
(mean 0 i.e., with no intercept), yielding the following estimate output:
Coefficients:
ar1 ar2 ma1
-0.062 0.817 0.971
To check the robustness of the fit, the analyst removes the last 100 observations (leaving 900) and fits
the ARMA(2,1) model again. The analyst is surprised to see the following very different estimates
output:
Coefficients:
ar1 ar2 ma1
0.752 0.112 0.100
(a) Provide an explanation for why these different results were output.
(b) Provide a diagnostic tool or mechanism or method to assess the answer you gave in the previous
part, and explain how you would use it or what you would look for.
(c) Based on the two sets of estimates above and your answers to question (a), provide an estimated
model for the data. That is: write down a model, including names/symbols for any unknown
parameters, and provide an estimate for any unknown parameters. You do not need to estimate
[Note: your estimate(s) will just be rough estimate(s) based only on the R output presented
above. There are a variety of similar possible answers, all of which will get credit.]