Homework #2
Due to Oct 23, 2019
No.1 The air data set contains the measures of ozone (OZ), solar radiation (RAD), temperature (TEMP)
and wind speed (WIND) for 111 consecutive days in a city of the state of New York. The four columns are
OZ, RAD, TEMP and WIND. Consider the nonparametric model
Yi = θ(Xi) + εi,where Y = OZ, X = W IND, εi v N(0, σ2)
1. Fit a linear model and a quadratic model of OZ on WIND and compare the parametric fits with a
nonparametric fit using the kernel method. Comment on your results.
2. Use the R functions ksmooth to estimate θ(t). Try a few bandwidths and a few kernel functions and
examine how the kernel estimator of θ(t) is affected by the bandwidth h and the kernel function K(u).
3. Use the R functions ksmooth, loess.smooth (loess) and supsmu to estimate θ(t) and compare their fits.
Use the loess function to calculate the 95% confidence interval of ˆθ(t).
4. Write a function using your favorite programming language, e.g., R, Matlab or C , to construct a local
linear kernel estimate of θ(t) using the Epanechnikov kernel with bandwidth h = 5. The Epanechnikov
kernel is defined as
K(u) = 3
4(1 − x2)I(|x| < 1)
The Epanechnikov kernel is optimal in the sense that it minimizes the integrated MSE. Estimate the SE
of ˆθ(x; h) and implement it. For simplicity, you can estimate σ
2 using ˆσ2 = n−1 Pn
i=1{Yi − ˆθ(Xi)}.
No.2 Generate data set (xi
, yi), i = 1, · · · , 400, where xi
is a 30 × 1 vector, yi = xi1 − 1.5xi3 + 0.8xi11 + εi.
1. Forget the real relation between xi and yi. Give the OLS estimator basing on data (xi, yi), i = 1, · · · , 400.
2. Using R package glmnet to implement LASSO to get a sparsity estimator. (Simulated the data NS =200)