#
代写STA2202课程作业、R编程设计作业代做、代做R课程设计作业、代写data作业
代写留学生 Statistics统计、回归、迭代|代做SPSS

STA457/STA2202 - Assignment 1

Submission instructions:

• Submit a single PDF file with your answers to both Theory & Practice parts to A1 on Quercus - the

deadline is 11:59PM on Thursday, May 21.

• Your answers to the Theory part can be handwritten (PDF scan/photo is OK).

• Your answers to the Practice part should be in the form of a report combining code, output, and

commentary. You can compile your report with RMarkdown (recommended) or another editor

(e.g. Word/LaTex).

Theory

1. In this course we work with (weakly) stationary time series. This class of models is closed under linear

tranformations, i.e. whenever you take a (non-exploding) linear combination of stationary series, you

always end up with a stationary series. For this question you have to prove this result. Consider two

independent zero-mean stationary series, {Xt} and {Yt}, with autocovariance functions (ACVFs) γX(h)

and γY (h), respectively.

(a) [4 marks] Find the ACVF of the linear combination Zt = aXt + bYt, a, b ∈ R in terms of the ACVFs of

{Xt}, {Yt}, and show that it is stationary (i.e. only depends on h).

(b) [6 marks] Find the ACVF of the linear filter Vt =

Pp

j=0 ajXt−j , aj ∈ R in terms of the ACVF of {Xt},

and show that it is stationary.

2. [10 marks] Consider the random walk (RW) series Xt = Xt−1 + Wt, ∀t ≥ 1, where X0 = 0 and

Wt ∼ W N(0, 1). Although the series is not stationary, assume we treat it as such and calculate the

sample ACVF γˆ(h), based on a sample of size n, as:

γˆ(h) = 1

n

nX−h

t=1

(Xt+hXt), ∀h = 0, 1, . . . , n − 1

Show that the expected value of the sample auto-covariances are given by

E[ˆγ(h)] = (n − h)(n − h + 1)

2n

(Hint: the ACVF of X is γ(s, t) = min(s, t), ∀s, t ≥ 1, and the arithmetic series formula is Pn

i=1 i =

n(n + 1)/2.)

(Note: this illustrates the behavior of the sample ACF of a RW series: it is in fact a quadratic in h, but

it behaves very close to linear for the small values of h that appear in the ACF plot.)

Practice

You will work with Statistics Canada’s open socio-economic series data. The data are organized by topic in

tables, and we will focus on monthly employment numbers by industry (table 14-10-0355-01); see also this

1

brief tutorial. An easy way to access these data directly through R is with the cansim library, using “vectors”

to identify individual series. You will be working with employment data for diferent industries

and over different time periods, based on the last two digits of your student #, according to

the scheme described in the following tables:

last

digit

of

student

# Industry Unadjusted Seasonally adjusted Trend-cycle

1 Accommodation and food services v2057828 v2057619 v123355122

2 Agriculture v2057814 v2057605 v123355108

3 Construction v2057817 v2057608 v123355111

4 Educational services v2057825 v2057616 v123355119

5 Forestry, fishing, mining, quarrying, oil and gas v2057815 v2057606 v123355109

6 Goods-producing sector v2057813 v2057604 v123355107

7 Information, culture and recreation v2057827 v2057618 v123355121

8 Manufacturing v2057818 v2057609 v123355112

9 Public administration v2057830 v2057621 v123355124

0 Services-producing sector v2057819 v2057610 v123355113

2nd to last digit of student # Time period

odd Jan 1980 to Dec 1999

even Jan 2000 to Dec 2019

E.g., if your student ID ends in 42, you should use the Agriculture industry data (last digit = 2) over Jan

2000 to Dec 2019 (next-to-last digit = 4 is even). Beware to use the right data, otherwise you will

lose marks. The following starter code downloads the data for student # ending in 42.

library(cansim)

## Warning: package 'cansim' was built under R version 3.6.3

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 3.6.3

# unadjusted (raw) series

ua = get_cansim_vector( "v2057814", start_time = "2000-01-01", end_time = "2019-12-01") %>%

pull(VALUE) %>% ts( start = c(2000,1), frequency = 12)

plot(ua)

2

Time

ua

2000 2005 2010 2015 2020

250 300 350 400

1. [3 marks] Plot the unadjusted series, its ACF & PACF, and comment on the following characteristics:

trend, seasonality, stationarity.

2. [5 marks] Perform a classical multiplicative decomposition of the unadjusted series (Xua) into trend

(T), seasonal (S), and remainder (R) components (i.e. Xua = T × S × R):

a. First, apply a 12-point MA to the raw (unadjusted) series to get an estimate of the trend.

b. Then, use the detrended data to estimate seasonality: find the seasonal pattern by caclulating sample

means for each month, and then center the pattern at 0 (i.e pattern sum should be 0).

c. Finally, calculate the remainder component by removing both trend and seasonality from the raw series.

Create a time-series plot of all components like the one below.

(Hint: you results should perfectly match those of the decompose function, which uses the above

process)

3. [2 marks] Statistics Canada (StatCan) does their own seasonal adjustment using a more sophisticated

method (namely, X-12-ARIMA). Download the corresponding seasonally adjusted series for your

industry and time period, and plot them on the same plot with your own seasonally adjusted data

(Xsa = Xua/S = T × R) from the previous part. The two versions should be close, but not identical.

Report the mean absolute error (MAE) between the two versions (StaCan’s and yours) of seasonally

adjusted data.

4. [5 marks] The library seasonal contains R functions for performing seasonal adjustments/decompositions

using various methods. Use the following three methods described in FPP for performing seasonal

adjustments (you don’t need to know their details):

a. X11

3

b. SEATS

c. STL

Create seasonaly adjusted versions of your raw series based on each method, and plot them together

with StaCan’s version. Note that the first two methods (X11 & SEATS) are multiplicative by default,

and you must use the forecast library function seasadj, seasonal, trendcycle, and remainder to

extract the various components. The last method (STL) however is only additive, so you need to take

a logarithmic transformation of the data to do the multiplicative decomposition, and then transform

them back to the original scale for making comparisons.

Which method gives a seasonal adjustment that is closest to StaCan’s, based on MAE?

4. [5 marks] Using StatCan’s data (unadjusted, and/or seasonally adjusted, and/or trend-cycle), calculate

the remainder series (R). Plot R and its sample ACF and PACF, and answer the following questions:

a. Based on these plots, can you identify any remaining seasonality in your series?

b. Comment on the stationarity of the series and propose any further pre-processing.

c. Comment on the (partial) autocorrelations of the series, and propose an appropriate ARMA(p, q) model

(i.e. appropriate orders p & q).

5. [10 marks; STA2202 (grad) students ONLY] Download employment data up to April 2020 (the

most recent month) for all of the above industries, and use them to answer the following question:

Which industry’s employment was hit hardest by the COVID-19 pandemic?.

You need to back up your answer with valid arguments based on time series techniques, to account for

things like seasonality (e.g., you can’t simply rank last month’s differences in employment numbers).

Clearly explain your reasoning and the methods & metrics used for making comparisons.

Acknowldgements:

Thanks to our TA Yang Guo for researching the data used in this assignment.

4