ECON7310: Elements of Econometrics
Final Problem Set
Fu Ouyang
November 14, 2022
Instruction
Answer all questions following a similar format of the answers to your tutorial questions. When
you use R to conduct empirical analysis, you should show your R script(s) and outputs (e.g.,
screenshots for commands, tables, and figures, etc.). You will lose 2 points whenever you fail to
provide R commands and outputs. When you are asked to explain or discuss something, your
response should be brief and compact. To facilitate tutors’ grading work, please clearly label
all your answers. You should upload your answers (in PDF or Word format) via the “Turnitin”
submission link (in the “Final Problem Set” folder under “Assessment”) by 11:59 AM on the
due date November 17, 2022. Do not hand in a hard copy. You are allowed to work on this
assignment in groups; that is, you can discuss how to answer these questions with your group
members. However, this is not a group assignment, which means that you must answer all the
questions in your own words and submit your report separately. The marking system will check
the similarity, and UQ’s student integrity and misconduct policies on plagiarism apply.
1. Panel Data Regression (20 points)
You investigate the deterrent effects of execution on murder using the panel dataset murder.csv,
which includes the (U.S.) state-level data on murder rates and executions.
(a) (4 points) Consider the following model with unobserved effects:
mrdrteit = λt + β1execit + β2unemit + αi + uit, (1)
where execit denotes the number of past executions of state i by year t, mrdrteit and
unemit denote the murder and unemployment rate of state i in year t, respectively. Which
factor in the model (1) does represent unobserved state fixed effect (1 point)? Year fixed
effect (1 point)? If past executions of convicted murderers have a deterrent effect, what
should be the sign of β1 (1 point)? What sign do you think β2 should have (1 point)?
(b) (8 points) Using the data for all three years (1987, 1990, and 1993), estimate the equation
(1) by OLS and report estimation results (2 points).1 Compute cluster-robust standard
errors (SE). Why not just simply compute the heteroskedasticity-robust SE (2 points)?
How many “clusters” do you have in the data (2 points)? Do you find any evidence for
deterrent effects (2 points)? Hint: Use time dummies to estimate time effects.
(c) (8 points) Now, using the data for all three years, estimate the equation (1) by fixed effects
(FE) regression and report estimation results (2 points). Again compute cluster-robust
SE. Is there any evidence of deterrent effects (2 points)? Is there any evidence of time
effects (2 points)? Compare the estimation results obtained in (b) and (c). Comment on
your findings (2 points).
1Here you can treat αi + uit as a composite error term of the regression model (1).
1
2. Binary Choice Models (20 points)
You want to study female labor force participation using a sample of 872 women from Switzerland (swiss.csv). The dependent variable is participation (=1 if in labor force), which you
regress on all further variables plus age squared; i.e., on income, education (years of schooling),
age, age2
, numbers of younger and older children (youngkids and oldkids), and on the factor
foreign, which indicates citizenship (=1 if not Swiss).
(a) (7 points) Run this regression using a linear probability model (LPM) and report the
regression results (3 points). Test if age is a statistically significant determinant of female
labor force participation (2 points). Is there evidence of a nonlinear effect of age on the
probability of being employed (2 points)?
(b) (7 points) Repeat (a) using a probit and logit regression model and report your results.2
(c) (6 points) Use the models of LPM and probit to compute the predicted probability of
being in the labor force for a Swiss female with median income and age of the sample, 12
years of schooling, one young kid, and no old kid.
3. IV Regression (40 points)3
You use the following regression model and dataset cigbwght.csv to estimate the effects of
several variables, including cigarette smoking, on the weight of newborns:
log(bwght) = β0 + β1male + β2parity + β3log(faminc) + β4smoke + u, (2)
where male is a dummy variable equal to 1 if the child is male; parity is the birth order of
this child; faminc is family income (in $1000); and smoke is a dummy variable equal to 1 if the
mother smoked during pregnancy.
(a) (4 points) Obtain OLS estimates of the regression equation (2) and report regression
results.
(b) (5 points) Interpret the estimated coefficient on smoke (3 points) and test whether the
population coefficient β4 is zero at the 1% significance level (2 points).
(c) (10 points) Some studies suggest that smoking during pregnancy may have different impacts on male and female babies. Modify the specification of the regression model (2) and
test this hypothesis (5 points). In your modified model, does smoke still has significant
(at 5% level) effects on the weight of newborns (2 points)? Explain your answer using test
results (3 points). Hint: You don’t need to report regression results here, but writing out
your modified regression model may be helpful.
(d) (7 points) One of your classmates expresses her concern about the validity of your regression analysis and argues that there may be unobserved health factors correlated with
smoking behavior that affect infant birth weight. For example, women who smoke during
pregnancy may, on average, drink more coffee or alcohol, or eat less nutritious meals. If
this is the case, do you think the OLS estimates you obtained in (a) are unbiased (consistent) (2 points)? Explain your answer (3 points). Is this a threat to your regression
analysis’s internal or external validity (2 points)?
(e) (4 points) You classmate then propose to use cigarette tax (cigtax) in each woman’s state
of residence as an instrumental variable (IV) for smoke and run a two-stage least squares
2You don’t need to compute robust SE here.
3For all OLS and TSLS regressions in this question, compute heteroskedasticity-robust SE.
2
(TSLS) regression. Take her suggestion and report your TSLS regression results. Hint:
Use the model (2) for your TSLS regression. You can use the iv robust() function in
the estimatr package to run TSLS regression and calculate robust SE.
(f) (10 points) Are coefficients of model (2) exactly identified, overidentified, or underidentified (2 points)? Does this TSLS regression suffer from the weak IV problem (2 points)?
Why or why not (2 points)? Is it possible to test the exogeneity of cigtax as an IV for
smoke (2 points)? Explain your answer (2 points).
4. Time Series (20 points)
The data file sp500d.csv contains daily data on S&P 500 index from 2000–2015. S&P 500 is
usually regarded as a gauge of the large cap U.S. equities market. Here you use this data to
examine the historical “returns” of investing in the S&P 500.
(a) (5 points) Let Yt denote the S&P 500 index. Draw a time series plot of Yt (2 points). Do
you think Yt
is a stationary time series (1 point)? Are yt = log(Yt) and ∆yt = yt − yt−1
stationary (2 points)?4
(b) (3 points) Use OLS to estimate the following AR(1) model5
∆yt = θ0 + θ1∆yt−1 + et
Does the AR(1) model fit the data well (1 point)? Explain your answer (2 points).
(c) (8 points) For the following AR(p) model
∆yt = θ0 + θ1∆yt−1 + · · · + θp∆yt−p + et
,
try p = 1, 2, 3, 4. You want to select the optimal number of lags (i.e., p) using AIC and
BIC as criteria. Which model do you think is the best (1 point)? Justify your choice (3
points). For the AR(p) model you select, compute the first five autocorrelations of the
regression residuals, ebt (2 points). Do you think the errors et of your selected model are
serially correlated (2 points)? Hint: You can calculate ebt manually using its definition or
alternatively apply the resid() function.
(d) (4 points) Let T denote the last trading day in the data. Forecast yT +1 and yT +2.
4Recall that log(x) represents the natural logarithm of x, i.e., the logarithm to the base e.
5Assume {et} to be homoskedastic and independent over time in this and all subsequent questions.