dataset辅导、辅导R程序语言、program讲解、R设计讲解辅导R语言编程|辅导Python编程

Statistical Assigment 2
This statistical assignment gets your hands “dirty” with R.
1 This statistical assignment
is due at 9am on May 6. You may work in teams of 2 or 3 (or by yourself...) and hand in
one assignment for the team. Make sure that all names of the team are on the assignment.
20% will be docked for each day late the assignment is handed in.
1 Dataset Details
The dataset for this assignment “data assign2.csv” comes from the US Census. The
sample in your dataset covers women with at least 2 children. The dataset consists of
254,645 observations from the 1990 census. There are variables in the dataset are:
• kidcount: number of children of mother i
• thirdkid: indicator equal to one if mother has 3 or more kids (i.e., =1 if kids≥ 3, =0
if kids<3)
• boy1st: indicator equal to one if first child is a boy (i.e., =1 if First child is boy, =0 if
First child is girl)
• boy2nd: indicator equal to one if second child is a boy (i.e., =1 if Second child is boy,
=0 if Second child is girl)
• agem1: age of the mother in 1990
• agefstm: age of the mother at first birth
• black: indicator equal to one if mother is African American (and =0 otherwise)
• hisp: indicator equal to one if mother is Hispanic (and =0 otherwise)
• othrace: indicator equal to one if mother is other non-white race (and =0 otherwise)
(e.g., Filipino, Pacific Islander, Multi-racial)
• workedm: indicator equal to one if mother is working (=0 if not working)
• incomem: weekly labour market income of the mother (not used in this assignment,
but there if you are interested).
1You are welcome to use another statistical program such as STATA or Matlab if you wish.
1
General Instructions
Please hand in assignment as one document. Please answer the questions succinctly. For
this assignment, we are interested in estimating the following relationship:
yi = β0 + β1thirdkidi + i
where:
• yi
: Indicator for mother i working (i.e., the ‘workedm’ variable in your dataset)
• thirdkidi
: Indicator for mother i having three children
Question 1
What is the probability of having a third child for parents whose first two children are
the same gender? What about for parents whose first two children are of different genders?
Describe why this provides a ‘first-stage’ for the instrument.
Question 2
Estimate the ‘first-stage’ and ‘second-stage’ of the instrumental variable regression using
“same gender for first two kids” as the instrument without any additional controls. Interpret
the point estimate for both the ‘first-stage’ and ‘second-stage’.
Question 3
Run the instrumental variable regression without any controls. Interpret the resulting
point estimate and clearly describe which type of individuals is identifying the ‘LATE’.
Question 4
Discuss the (internal) validity of the instrument. In addition, state who exactly a defier
would be in this example and whether the “no defier” (or monotonicity) assumption is likely
to hold.
Question 5
Now run the IV regression with the following controls: race, age of the mother, age of
the mother squared, age of the mother at first birth, age of the mother at first birth squared.
Does the point estimate change relative to question 3? Is this reassuring in terms of (internal)
validity? Describe why or why not.
Question 6
We could imagine doing a different instrumental variable, but with a similar spirit. Specifically,
there is a large amount of research that many cultures have a “boy” preference for
children (i.e., parents prefer having boys to girls). In that spirit, we could define the instrumental
variable instead as “having two girls” relative to “having two boys” (for first
two children). Intuitively, if prior research is correct having two girls should induce some
individuals to have another child relative to having two boys. Create that instrument and
run the IV regression (with controls). Why do you think the estimate differs substantially
from the IV estimate in Q5? [hint: think of the LATE!]