SOSC 5440: Economics of Development
Spring 2025
Data Session: Difference-in-Differences
Data Session: Duflo, E. (2001)
Replicate main findings of this paper:
1. Estimate the effect of school construction program (INPRES) on years of schooling and earnings
▶ Difference-in-Difference in means (Table 3)
▶ Formal regression framework (Table 4; Figure 1)
2. Estimate returns to education
▶ OLS and 2SLS approach (Table 7)
Download data from Moodle:
supas . dta
Load data:
use supas . dta
Difference in Difference Estimation
Install user-made difference-in-difference command:
ssc install diff
diff outcome, treated (varname) period (varname) cov (varlist)
cluster (varname)
▶ treated: binary treatment variable (=1 if treated; 0 otherwise)
▶ period: binary period variable (=1 if post; 0 otherwise)
▶ cov: specifies the pre-treatment covariates of the model
▶ cluster: calculates clustered standard errors by varname
Data Session: Difference-in-Differences
First, DID with young (treatment group) and old (control group) cohorts:
keep if young == 1 | ld == 1
diff yeduc , treated ( high ) period ( young )
▶ high indicates regions with high intensity of school construction;
young indicates cohorts exposed to treatment
Let’s try to cluster the standard errors at different levels:
diff yeduc , treated ( high ) period ( young ) cluster ( YOB )
diff yeduc , treated ( high ) period ( young ) cluster ( ROB )
▶ YOB is year of birth; ROB is region of birth
We can also cluster at region and year of birth level:
egen RANDY = group ( ROB YOB )
diff yeduc , treated ( high ) period ( young ) cluster ( RANDY )
▶ When is the standard error largest? when is it smallest?
Data Session: Difference-in-Differences
Next, DID with old and very old cohorts
keep if ld == 1 | veryold == 1
diff yeduc , treated ( high ) period ( old )
▶ why would you want to compare between old and very old?
▶ if there was any pre-trend in educational outcomes what would you have found?
Data Session: Difference-in-Differences
Now use regression to estimate DID with young and old cohorts:
▶ A##B in regression is equivalent to writing A B A ×B
keep if young == 1 | ld == 1
reg yeduc c. high ## c. young , cluster ( ROB )
▶ what is your OLS estimate of DID? Is it identical to the one you found using diff command?
Repeat DID regression with old and very old cohorts:
keep if ld == 1 | veryold == 1
reg yeduc c. high ## c. old , cluster ( ROB )
Data Session: Difference-in-Differences
Try to estimate with log(wages) as outcome variable:
keep if young == 1 | ld == 1
diff lhwage , treated ( high ) period ( young )
diff lhwage , treated ( high ) period ( young ) cluster ( YOB )
diff lhwage , treated ( high ) period ( young ) cluster ( ROB )
Placebo test with old and veryold cohort:
keep if ld == 1 | veryold == 1
diff lhwage , treated ( high ) period ( old )
Data Session: Difference-in-Differences
Estimate basic regression framework using intensity of treatment:
▶ generate DID variable by interacting prog int and young
▶ prog int indicates number of schools constructed in region
keep if young == 1 | ld == 1
reg yeduc c. prog _ int #c . young c. ch71 #c. young
i. ROB i. YOB i. YOB #c . ch71
reg lhwage c . prog _ int # c. young c. ch71 #c. young
i. ROB i. YOB i. YOB #c . ch71
Placebo test in basic regression framework:
keep if ld == 1 | veryold == 1
reg yeduc c. prog _ int #c . old c. ch71 # c. old
i. ROB i. YOB i. YOB #c . ch71
reg lhwage c . prog _ int # c. old c . ch71 #c. old
i. ROB i. YOB i. YOB #c . ch71
Data Session: Difference-in-Differences
Install written program for plotting coefficients:
ssc install coefplot
Estimate general regression framework in paper:
▶ Interact each year of birth dummy with program intensity
xi i. YOB * prog _ int
▶ Create labels for interacted variables (displayed in graph)
forvalues i = 51(1)72 {
local j = 74 - ‘ i ’
label ␣ var ␣_ IYOBXprog _ ‘i ’ "‘ j ’"
Data Session: Difference-in-Differences
Regress and store estimates in matrix format
reg yeduc _ IYOBXprog _* i. ROB i. YOB i . YOB # c. ch71
esttab , keep (_ IYOBXprog _* ) cells ( b ci _l ci _u ) nocons
mat B1 = r( coefs )
mat C1 = B1 ’
Plot coefficient estimates using coefplot program:
coefplot matrix ( C1 ), ci ((2 3)) vertical recast ( connected )
ciopts ( recast ( rline ) lpattern ( dash )) yline (0)
Data Session: Difference-in-Differences
Estimating returns to education:
▶ OLS estimate - regress wage on years of schooling
▶ IV estimate - use school construction intensity as instrument for years of schooling
keep if young == 1 | ld == 1
reg lhwage yeduc i. ROB i. YOB i. YOB #c . ch71
ivregress 2 sls lhwage i. ROB i. YOB i. YOB # c. ch71
( yeduc = i. YOB #c. prog _ int )