Term Project and Homework Assignments
Returns to Education
ECON 4400
1 Overview
Human capital, defined as the skills, knowledge, and abilities that an individual possesses, has been a focal point of economic research in labor, development, and political economy—to name a few. Education and job training are important human capital investments, leading to higher earnings and non-pecuniary bene- fits. Your ECON 4400 project focuses on the former. You will quantify the returns to education, estimating the effect of a year of schooling on an individual’s wage. While economics has developed sound theoret- ical foundations, empirical work on the return to human capital has been at the center of considerable debate.
As part of your project, you will explore a part of that debate by replicating (approximately, I have simplified the analysis to a degree) the results of Angrist and Krueger (1991) using the 2022 American Community Survey (ACS). I chose this approach to foster critical thinking and deepen econometric knowledge. Our analysis will also draw upon Bound, Jaeger, and Baker’s (1995) critique of the instrumental variables ap- proach used in Angrist and Kreuger (1991). You are therefore required to read both papers, Angrist and Krueger (1991) and Bound, Jaeger, and Baker (1995). I have posted papers on our Carmen course site Mod- ules page under “Articles for Term Project” .
Throughout the term, you will complete parts of the analysis and submit each component as a homework assignment. In doing so, I can assist with your learning of econometrics in practice. Additionally, the home- work assignments enable me to address issues with coding or analysis.
For each assignment, you need only to submit what is requested. You will include the tables created for each assignment with your term project. A homework assignment will also ask you to introduce, discuss, and explain particular sections of your term project, e.g., data, regression analysis, results, and econometric methodologies. After I return the assignment, you should edit and expand the section following the outline below, addressing any notes or needed corrections.
You will analyze the returns to education in a U.S. state. Refer to Table 1 to see your assigned state. To download your data file, log on to Carmen, go to Modules, scroll toward the bottom of the page, and download the state data file assigned to you.
2 Paper Requirements and Expectations
You will write a three to six page analysis (not including tables and can be longer if needed) on the returns to education and submit it at the beginning of class on Friday, 04/18. The paper will include two tables: a table of summary statistics and returns to education estimates (see Sections 3.1, 3.2, and 3.3. You need to attach your do-file with the paper. If you do not submit a working do-file, you will receive, at most, half credit for this assessment.
Your do-file needs to be cleaned of any redundant or incorrect commands. The entire do-file needs to be executable. In other words, if you click the execute icon, Stata executes every command without error.
Your write-up of the analysis should follow the below general outline–the sub-items do not need to follow the stated order. You must address each enumerated item but can include additional background or support as needed. Your writing needs to flow (does not read as an itemized list), and each paragraph must consist of one key idea with supportive statements (evidence, results, etc.) about that key idea. You must also ensure your writing includes transitions between key ideas (paragraphs).
1. Introduction
(a) Discuss the importance and benefit of education in the context of earnings. For background, read the following papers:
• “Economic returns to education: What We Know, What We Don’t Know, and Where We Are Going–Some Brief Pointers” by Dickson and Harmon (2011)
• “Does Compulsory School Attendance Affect Schooling and Earnings” by Angrist and Krueger (1991)
• “Educational Attainment and Quarter of Birth: A Cautionary Tale of LATE” by Barua and Lang (2008)
• “Problems With Instrumental Variables Estimation When the Correlation Between the In- struments and the Endogenous Explanatory Variable is Weak” by Bound, Jaeger, and Baker (1995)
You can access the papers in the module Articles for Term Project on our Carmen course site’s Modules page.
2. Data and Methodology
(a) Cite and discuss the data used for the analysis
(b) Discuss the sub-samples used for the analysis, referencing the summary statistics
3. Returns to Education
(a) Introduce and discuss the wage equation (b) Discuss OLS return to education
(c) Discuss why the OLS estimate for the return to education is biased
(d) Discuss Two Stage Least Squares (2SLS) estimator–how does it address the endogeneity prob- lem?
(e) Discuss the instrumental variables, including the relevancy and validity requirements (f) Discuss the 2SLS return to education
(g) How do your results compare to Angrist and Krueger (1991). Specifically, do your results com- port with the authors’ findings for the 30-39 and 40-49 cohorts?
(h) Compare and discuss OLS versus 2SLS estimates. Do the result meet expectations? Explain (Hint: why is the OLS estimator of the returns to education biased?) Discuss the F-statistic from the test for weak instruments. What insights does the test provide regarding the results?
4. Discussion and Conclusion
2.1 Paper Formatting
• Font: 11pt Times New Roman font
• Margins: One-inch margins (top, bottom, left, and right)
• Line spacing: 1.5 lines
• Start of new paragraph: Indent (no additional spacing between paragraphs)
• Text Alignment: justified
• Make sure to include your first and last name on the paper
References and Citations - Chicago Style. If you choose to support an argument by drawing on the work of other scholars, you need to follow the below citation and reference style (Chicago). When you cite an article or research paper, you must include a reference section with your paper.
Citation and reference examples:
In-text citation Reference list
Author Year First author’s last name, first author’s first name, second author’s first and last names, third author’s first and last name, . . . , and last author’s first and last name. Year of publications. “Title of article.” Title of Journal, volume number(issue/number, or date/month of publication if volume and issue are absent): page numbers (if any).
Example - Parenthetical
(Tesseur 2022) Tesseur, W. 2022. “Translation as inclusion? An analysis of international NGOs’ translation policy documents.” Language Problems and Language Planning, 45(3): 261–283.
Example - Narrative
Piketty and Saez (2003) Piketty, Thomas, and Emmanuel Saez. 2003. “Income Inequality in the United States, 1913–1998.” The Quarterly Journal of Economics, 118(1): 1–41.
2.2 Stata Do-File
You will generate one do-file for this project. Each assignment will have you add to your code document (do-file). You must save your do-file at each step of the project (I recommend saving it regularly when working on an assignment). Separate each part using asterisks. For example:
********************
**ECON 4400 Project: Name - Assigned State
********************
********************
**Homework 1 - Summary Statistics . . .code here . . .
********************
********************
**Homework 2 - OLS Returns to Education . . .code here . . .
********************
********************
**Homework 3 - 2SLS Returns to Education . . .code here . . .
********************
2.3 Data Assignments
Table 1: Data (state) Assignments for Term Project (and Homework)
AlAjlouni, Ahmad
|
19
|
Iowa
|
Ali, Hafsa
|
13
|
Georgia
|
Backlin, Ben
|
6
|
California
|
Bobie, Kofi
|
21
|
Kentucky
|
Cai, Boxun
|
49
|
Utah
|
Campisi, Matthew James
|
20
|
Kansas
|
Caracciolo, Isabella Grace
|
51
|
Virginia
|
Chen, Gong
|
48
|
Texas
|
Dia, Djnda
|
25
|
Massachusetts
|
Dohler Rodas, Edison Emilio
|
44
|
Rhode Island
|
Duan, Tommy
|
46
|
South Dakota
|
Gu, Huajie
|
9
|
Connecticut
|
He, Feihuan
|
55
|
Wisconsin
|
Hou, Murong
|
29
|
Missouri
|
Huo, Yu
|
24
|
Maryland
|
Kopocs, Nate
|
27
|
Minnesota
|
Lintz, Nicholas Michael
|
16
|
Idaho
|
Liu, Renlong
|
5
|
Arkansas
|
Lu, Shibo
|
8
|
Colorado
|
Ma, Haotian
|
30
|
Montana
|
Maokhamphiou, Zhanguosong Jaynarong
|
28
|
Mississippi
|
Mendez, Jesse Wayne
|
33
|
New Hampshire
|
Oljira, Yemesrach Mulugeta
|
47
|
Tennessee
|
Pulsifer, Aiden
|
11
|
District of Columbia
|
Shah, Dhruv
|
42
|
Pennsylvania
|
Shi, Chloe
|
15
|
Hawaii
|
Spicer, Hannah Lauren
|
34
|
New Jersey
|
Sun, Xinrui
|
4
|
Arizona
|
Tessman, Vija Elizabeth
|
26
|
Michigan
|
Warner, Jeffrey
|
37
|
North Carolina
|
Williams, Gavin Redmond
|
17
|
Illinois
|
Wu, Oliver
|
35
|
New Mexico
|
Zhang, Bojia
|
54
|
West Virginia
|
Zhang, Guangjie
|
12
|
Florida
|
Zhang, Haoyue
|
36
|
New York
|
Zhao, Han
|
31
|
Nebraska
|
Zhao, Wenzhong
|
18
|
Indiana
|
3 Homework: Putting Together Your Analysis
3.1 Homework 1, Due Friday, 02/07
Overview of assignment and what you will submit: You will generate a table reporting summary statistics of various samples and write one to three paragraphs summarizing and comparing economic variables across different groups. We will compare multiple samples of individuals of various age groups, who reported an income in 2021. The focus of the write-up needs to be on composition of the samples relative to others. You want to focus you write up of how the sample of respondents between the ages 30 to 39 and 40 to 49 year olds compare to one another as well to those between the ages 25 and 64.
You will submit a paper copy of your write-up with the summary statistics table and a print-out of your do-file at the beginning of class on Friday, 02/07.
What to Submit - three items:
1. A write-up discussing the data source, the samples, and summary statistics. (I have included an example of a write-up of summary statistics below Table 2 for reference.)
2. A table of summary statistics, created using Word or Excel.
3. Attach a printout of your do-file (the entire document)
Instructions: You will generate summary statistics for four subsamples. The first subsample consists of all wage and salary workers and self-employed individuals between the ages of 25 and 64 who report a 2021 income (the 2022 ACS reports income from the prior year). The second sample restricts the first one to only wage and salary workers between the ages 25 and 64. The third subsample is comprised of wage and salary workers between the ages 30 and 39. The fourth subsample consists of wage and salary workers between the ages 40 and 49. We will use the latter two samples to estimate the returns to education.
Your first homework assignment will require you to complete a process known as data cleaning. Researchers often need to recode or generate new variables from survey data. The below commands will walk you through how to “clean ACS data” to estimate the returns to education and the probability that an individual participates in the labor force.
The task of data cleaning is often an arduous one. To cultivate command-based coding and data analytics skills using Stata, I am providing all the code for this portion of the project.
In Stata to indicate a range, e.g., tabulate incwage between 20,000 and 40,000, i.e., 20, 000 ≤ incwage ≤ 40, 000, the code is tab incwage if incwage>=20000 & incwage<=40000. Suppose you want a “or” statement, use | . For example, you want a count of respondents who are married: count if marst==1 | marst==2, where a value of one indicates a married person and two indicates married but separated (for assigned values and designations regarding marital status: label list marst_lbl). The vertical line | denotes “or” and & denotes “and” in Stata.
It is best practice to describe (label) newly generated variables. It will describe the variable enabling you to determine what it represents or measures when referring back to it. I am leaving variable labeling to you. It is not something you need to do, but it may be helpful later in the term.
label var variable_name "Description "
To begin,upload your assigned data into Stata (Note: If you copy the Stata code from this PDF, some characters may not correctly reproduce onto the do-file. If you receive an error message after execut- ing your do-file, check whether an incorrectly copied character is the source.):
use path/acs_2022_X .dta, clear
where path denotes the directory path where the data file is saved on your computer. The “X” is a place holder for the State FIP code, e.g., if assigned California, the State FIP code is 6.
Define the sample:
We need to define the appropriate subsamples for analysis. On your project do-file, type or copy the fol- lowing commands and execute the file. (Note: It is best practice when first using Stata to ensure each new command is executable before moving onto the next )
Keep all observations between the ages of 25 and 64. keep if age>=25 & age<=64
Keep all wage and salary workers and self-employed respondents. (Reminder: the vertical line is read as “or”)
keep if classwkr==1 | classwkr==2
Keep all wage and salary workers and self-employed respondents who report working on average 35 hours or more hours per week.
keep if uhrswork>=35
Generating an inputed hourly wage and keeping all wage and salary workers and self-employed respondents who earn more than $2/hr. (Tipped worker minimum wage is $2.13–rounded to $2/hr)
drop if incwage== . | incwage==999999 | incwage==999998
gen hwage=(incwage/wkswork1)/uhrswork
keep if hwage>=2