辅导EC 421、讲解Econometrics、R语言辅导、R程序设计调试
辅导Python程序|解析Java程序
Problem Set 2
Time Series (and a bit of Causality)
EC 421: Introduction to Econometrics
Due before noon (11:59am) on July 16th, 2020 (on canvas)
DUE Upload your answer on Canvas before noon on July 16th, 2020.
IMPORTANT You must submit two les:
1. your typed responses/answers to the question
2. the R script you used to generate your answers. Each student must turn in their own answers.
If you are using RMarkdown, you can turn in one le,
but it must be an HTML or PDF that includes your responses
and R code.
OBJECTIVE This problem set has three purposes: (1) reinforce the topics of time series and statistical inference;
(2) build your R toolset; (3) start building your intuition about causality within econometrics/regression.
INTEGRITY If you are suspected of cheating, then you will receive a zero. I may report you to the dean. Everything
you turn in must be in your own words.
2 / 9
Conceptual Questions
1. Remember that we've discussed three types of time-series models: (1) static models, (2) dynamic models with
lagged explanatory variables, (3) dynamic models with lagged outcome variables.
1a. If the disturbance is not autocorrelated, for which of the 3 types of models is OLS unbiased?
If any of the models are biased, explain why.
1b. If the disturbance is not autocorrelated, for which of the 3 types of models is OLS consistent?
If any of the models are inconsistent, explain why.
1c. If the disturbance is autocorrelated, for which of the 3 types of models is OLS unbiased?
If any of the models are biased, explain why.
1d. If the disturbance is autocorrelated, for which of the 3 types of models is OLS consistent?
If any of the models are inconsistent, explain why.
2. In our time-series lecture, we discussed how static time-series models are a pretty restrictive and simplistic
way to model time-series data.
2a. Explain why static time-series models are generally restrictive and simplistic.
2b. Give an example of a reasonable static time-series model. By reasonable we mean that it would be
reasonable to model the relationship as a static relationship. Explain why it is reasonable to model the
relationship as static rather than dynamic—and make sure you tell us what (t) would represent (e.g., days,
months, years).
Note: The model should look something like:
2c. Give an example of a reasonable dynamic time-series model. By reasonable we mean that it would be
reasonable to model the relationship as a dynamic relationship. Explain why this relationship should be
modeled as a dynamic relationship. Make sure you tell us what (t) would represent (e.g., days, months, years).
Note: The model should look something like
3. Time-series models frequently include the lag of a variable, e.g., . Explain why we usually do not use lags in
cross-sectional models, e.g., .
ut
ut
ut
ut
Birthst = β0 + β1
Incomet + ut
Birthst = β0 + β1
Incomet + Incomet−1 + ut
xt−1
xi−1
3 / 9
Some Real Data
Now we're going to work with some real data. The data come from the Environmental Protection Agency (EPA).
Specically,
the data describe electricity generation in the United States at a monthly level—the amount of
electricity generated, associated emissions, the number of retirements, etc.
For more information on the dataset, see the table on the last page of this problem set.
Why? Electricity generation is obviously important for day-to-day life: it runs our heating and air conditioning, it
allows us to have computers/phones/internet/refrigerators/etc., and it supports many businesses and critical
parts of our health systems and economy.
Emissions are important, because burning fossil fuels (e.g., coal and natural gas) produces toxic gases that are
released above the plant. These gases (emissions) have been traced to a bunch of negative outcomes—for people,
animals, plants, and the general environment (e.g., acid rain). Economics is about thinking on the margin: Where do
the marginal benets
from something equal the marginal costs? We know we need electricity, so we do not want to
make it too expensive for electricity generators to operate, but if we do not regulate electricity generation, then the
power plants may poison our air and water. Thus, one job of economists (specially environmental and energy
economists) is guring
out how regulations affect health, environment, and energy costs.
4. Load packages and your dataset hw_2_data.csv .
5. Which dates does the dataset cover (what are the start and end dates)? How many months?
6. How many plants retired during this sample?
7. Create (and include) three gures:
(1) the time series of total monthly generation ( generation_gwh ), (2) the
time series of NOx
(Nitrogen Oxide) emissions ( emissions_nox ), and (3) the time series for the number of
electricity generators who retired in the given month ( n_retirements ).
Hint: A time-series graph has time on the axis and a variable on the axis. Your axis can have either time t
(time relative to the beginning of the sample) or date ( month ).
8. For each of the three time-series graphs in 7, explain whether the variable appears to be positively
autocorrelated, negatively autocorrelated, or not autocorrelated. Make sure you explain your reasoning.
9. Estimate a static time-series model where monthly NOx emissions ( emissions_nox ) are the outcome variable
and our two explanatory variables are the number of retirements in the month ( n_retirements ) and the amount
of electricity generation in the month ( generation_gwh ).
Report your coefcient
estimates and their statistical signicance.
x y x
4 / 9
10. Now estimate a dynamic model in which you include the rst
lag for each of your explanatory variables
(number of retirements and amount of electricity generation). Note: You still want the non-lagged version of the
variables too—i.e., include and . Interpret the coefcient
on the lagged number of retirements.
11. Why might it make sense to include lags of the variable number of retirements? In other words: Why might we
want a dynamic model with lagged explanatory variables in this setting?
12. If the disturbance is autocorrelated, what problems does it cause for OLS regression estimates in 10?
Answer: If 10 has an autocorrelated disturbance, then OLS is inefcient
and has biased standard-error estimates.
13. Use the residuals from the regression in 10 to test for rst-order
autocorrelation in your disturbance. Report
the results from the hypothesis test.
Hint: Don't forget about the missing values due to lags (see lecture notes).
xt xt−1
5 / 9
14. Now estimate a dynamic model (still with NOx emissions as the outcome variable) with 0, 1, 2, and 3 lags of
the number of retirements and also the current month's electricity generation (no lags). Interpret the coefcient
on the third lag of the number of retirements.
15. Based upon your estimates in 14, what is the total effect of a retirement on NOx emissions?
Note: This estimate essentially assumes that the effect is gone after four months, which is not likely.
16. Now estimate an ADL(1,1) model with NOx emissions as the outcome and with number of retirements and
electricity generation as the explanatory variables. Report/interpret the coefcient
on the lag of NOx emissions.
Hint: Your regression should have an intercept plus ve
more terms.
The coefcient
on the lag of NOx emissions tells us that a one-ton increase in NOx emissions in the previous
month is associated with a 0.925-ton increase in NOx emissions in the current month. This relationship is very
statistically signicant.
The relationship says that our outcome is strongly correlated with itself in time.
17. Does it make sense to regress current NOx emissions on the previous month's emissions? Explain your answer.
18. If the disturbance is autocorrelated, then OLS is not consistent for the coefcients
in 16. Explain how you
could test for an autocorrelated disturbance using the model from 16.
Note: You do not actually need to run this test.
6 / 9
Causality
Imagine that we are interested in analyzing a government program. We consider individuals as treated if they
participated in the program (and untreated if they did not). Following the notation of the Rubin causal model,
imagine that we observe the following sample (which would be impossible observe in real life):
Table: Imaginary dataset
i Trt. y1 y0
1 0 17 8
2 0 7 5
3 0 10 4
4 1 5 1
5 1 0 0
6 1 1 4
19a. Calculate and report the treatment effect for each individual (i.e., ).
19b. Is the treatment effect heterogeneous or homogeneous? Briey
explain your answer.
19c. Calculate and interpret the average treatment effect for the sample.
19d. What does it mean if for one individual and for another individual?
19e. Estimate the average treatment effect by comparing the mean of the treatment group to the mean of
the control group. Report your estimate.
19f. Should we expect our estimator in 19e to provide unbiased estimates? Explain.
19g. Why would it be impossible to actually observe all of the data in the table (in real life)? Specically:
Which
parts of the dataset would we not observe in real life? Think about the fundametal problem of causal inference.
19h. Dene
and explain selection bias.
19i. Calculate (and report) the selection bias in this sample.
τi
τi < 0 τj > 0
7 / 9
Extra Credit: IV
The purpose of this question is to illustrate why you cannot estimate demand and supply equations with OLS.
You may receive up to 4 extra credit homework points on this question.
Suppose we we are interested in estimating demand elasticities. We think supply and demand relationships are
given by:
This next piece of information will be crucial for a later part of the problem so make note of it. You may assume
and . The rst
assumption says the unconditional mean of the demand and supply
shocks are zero. The second assumption says that the demand and supply shocks are independent from one
another (e.g their covariance is zero). You may assume that the variance of the shocks (disturbances) is
homoskedastic. Note that these assumptions, taken together, imply that and
.
Unfortunately, in our data we only observe equilibrium prices and quantities (that is: and ). This will lead to
endogeneity of in both equations. To see this endogeneity, we can impose the equilibrium condition and
solve for :
Clearly, and . This is where the endogeneity is coming from. More
intuitively, the equilibrium price, (our data) is impacted by both demand and supply shocks -- and .
20a. Ignoring the endogeneity for a moment, why can we interpret and as elasticities?
20b. Compute the equilibrium quantity . Hint: Just use the equilibrium price equation I gave you. There is
only one step here (plus maybe a few extra if you simplify the expression).
20c. Calculate the covariance between and . Remember, the covariance of two random variables,
and is given by:
20d. Recall: . Use your answer to 20c. to compute this probability limit.
20e. We will attempt to estimate with instrumental variables. As an example, suppose the demand and supply
equations we are estimating are for cigarettes. Our instrument is the general sales tax per pack in each state,
. What are the two conditions for the instrument to be valid? If you can test either of them, write out how you
would (specically
give a regression equation if it is possible).
20f. Now I want you to argue for the and against the exogeneity condition. Specically,
write out, at most, two
sentences arguing why I should believe the instrument is exogenous, and two as to why I shouldn't believe it is
exogenous. This is tough to get "right" -- so spend a bit of time thinking about it.
log(Qit) = κd + α1
log(Pit) + uit demand
log(Qit) = κs + α2
log(Pit) + ϵit supply
Variable Description
t Time, relative to the rst
month of the sample (1, 2, ...)
month Month of the sample (e.g., 2015-12-01)
generation_gwh Total monthly electricity generation (Gigawatt hours, GWh)
emissions_so2 Total monthly emissions of SO2
(in tons)
emissions_nox Total monthly emissions of NOx
(in tons)
n_plants Number of unique electricity-generating units (EGUs) operating in the month
n_retirements Number of retired electricity generating units in the month
cumulative_retirements Cumulative number of retirements (through the given month)
i_cair Binary indicator for months during the Clean Air Interstate Rule (CAIR)
i_csapr Binary indicator for months during the Cross-State Air Pollution Rule (CSAPR)
9 / 9