PPHA 31002讲解、辅导RCTs留学生、R程序语言调试、R讲解调试Matlab程序|讲解留学生Processing

Fall 2019
PPHA 31002
Homework 8
Due: Wednesday, December 4th, 2019 at 11:59pm
Your write-up should include figures you generate in part 1, answers to the questions in both parts, any code
used to generate answer the questions, and the full and detailed work to your answers in part 2.
Randomized Control Trial: De-Worming in Rural Kenya
We will be “replicating” a key null result in Miguel, Edward, and Kremer, Michael. 2004. “Worms: Identifying
Impacts on Education and Health in the Presence of Treatment Externalities.” Econometrica 72 (1): 159–217.1
While they found that the treatment increased attendance, they failed to find an effect on test scores.
This is a very famous paper in development economics for several reasons. It was an important demonstration of
how to run a Randomized Control Trial (RCT), especially in the presence of spillovers from treated to non-treated
individuals. It also played an important part in a debate regarding external validity of the results of deworming
interventions, and of RCTs more broadly. We will focus on just the key components of the paper.
Here is a short summary (from the replication files they prepared): “Hookworm, roundworm, whipworm, and
schistosomiasis infect more than one in four people worldwide and are particularly prevalent among school-age
children in developing countries. The former three worms are transmitted through ingestion of or contact with
infected fecal matter, while schistosomiasis parasites are carried by snails in water where children swim or bathe.
Because these worms do not reproduce in their human hosts, most infected individuals have minor cases with
few if any symptoms. However, severe helminth infestations, resulting from repeated infection, can cause iron
deficiency anemia, protein energy malnutrition, stunting, wasting, listlessness, and abdominal pain. In addition to
the negative health and nutritional consequences, worm infections often result in impaired cognitive ability, poor
academic performance, reduced school attendance, and high drop out rates.”2
1. Read the first two pages of the paper (the first one and a half should be enough as well) and summarize in
five to seven sentences what the problem the paper is discussing, what is the research question, how what
intervention are they running as an experiment, and what are their main findings. (2 points)
2. Load the data file deworming kenya data.csv into R.
3. You have the following variables in the data:3
• pupil id: a numeric identifier for each student.
• pupil birth year: the year the student was born in.
• school id: a numeric identifier for the school of the student.
• treatment group: a categorical variable denoting when the students in the school received treatment:
– =1: Began receiving treatment in 1998.
– =2: Began receiving treatment in 1999.
– =3: Began receiving treatment in 2001.
• ics98, ics99: ICS Exam Score (normalized), for 1998 and 1999, respectively.
4. For each treatment group (there are three), and for each test score (there are two), plot a histogram of the
test score. Also include a table with the mean and sd of the test score. Ideally, try to have three rows (one
for each treatment group) and four columns (mean and standard deviation, for each test score). Note that to
calculate the these, you will also have to drop NAs. (3 points)
5. Papers that use an RCT often produce what is called a randomization table. The idea is to compare different
“observables” (variables we can observe) and see if there are systematic differences in those variables between
the treatment groups. If the randomization process worked, then there shouldn’t be any big differences across
1We write “replicating” because we will be performing a simplified version of their analysis. Still, it will be very close to the essence
of what they do in the paper.
2You can access the full data, and user guides on:https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/28038
3This is a version of the data we cleaned and simplified. The real raw data have a much more complex structure to them.
1
observables. Which supports the assumption (but does not prove it!) that there are no systematic differences
between unobservables as well. In our case, we are going to only look at one such observable characteristic
(birth year). You will often see papers use ten or more variables to try and convince you that randomization
worked. Before we “test for randomization,” there is (at least) one value in pupil birth year which does not
make sense and should get dropped. Which is it (are they)?4
(1 point)
6. What are the mean and sd for birth year in each treatment group? Notice this is asking to calculate the mean
and sd for the variable pupil birth year separately for each group, but is not asking anything about the test
scores yet. (1 point)
7. Run a t-test comparing the mean birth year in the first group to the second group. Repeat this comparison
for the first to third group, and for the second to third group. Can you reject the null that the birth year
is the same across any of these groups? Throughout the exercise, use a significance level of 0.05 (2
points)
8. We will now compare each treatment group, separately, using the two test scores we have. Given that the
tests were conducted in 1998 and 1999, and de-worming treatment was assigned to each group using the
definition provided above. Where would you expect to see an impact on tests scores? Why? Discuss briefly
(five sentences). (2 points)
9. Use paired t-tests to compare the two test scores separately for each group. Meaning, you should run one
paired t-test for each treatment group. Can you reject the null of no difference in any of those cases? If so,
how do you interpret the results of your test? Explain whether you chose to run a two-tailed or a one-tailed
test (or both). Either can be valid under a reasonable justification. (4 points)
10. Run a two-tailed t-test for two samples, using the test score for 1998, comparing group 1 to group 2. Calculate
the standard deviation according to the equal variances method of the two-sample t-test. How do you interpret
your result? (2 points)
11. Run a two-tailed t-test for two samples, using the test score for 1999, comparing group 2 to group 3. Calculate
the standard deviation according to the equal variances method of the two-sample t-test. How do you interpret
your result? (2 points)
12. What do you conclude from the results of the tests you’ve run regarding the effect that deworming treatment
had on test scores in Kenya? (1 point)
13. What can you say about the general efficacy of deworming treatments and their effect on tests scores? Will
they always be effective? Will they never be effective? (2 points)
The Bootstrap
The data nsw.csv are data from the National Supported Works experiment.5 This data have three variables:
whether the individual got treated (treated), whether the individual worked after treatment in 1978 (work78), and
the earnings of the individual in 1978 (earn78). We wish to discover whether treatment increased the earnings
of participants and increased the likelihood they worked. The treatment variable (treated) is equal to one if the
participant was assigned to the treatment group and is equal to zero if assigned to the control group.
1. Load the nsw.csv data in R.
2. Plot one histogram of work78 for observations with treated=1, and one histogram for treated=0. Repeat this
for earn78 (2 points).
3. Test the null hypothesis, with α = 0.05 (two-tailed), that the mean earnings of the treatment group and
control group are equal. Use both a t-test and z-test. Can you reject H0 with the t test? Can you reject
H0 with z test? For the t-test, calculate the standard deviation according to the equal variances method of
the two-sample t-test. Contrast the significance with a “z-test,” treating the standard deviation as given and
evaluating the significance with pnorm()6
(5 points)
4Hint: Offered as a gentle reminder to always, always, always, check and plot your data.
5More details about the program can be read in https://www.ncjrs.gov/pdffiles1/Digitization/59202NCJRS.pdf
In the z-test, we need to assume that the standard deviation we estimated is the known population standard deviation.
4. Test the null hypothesis that earnings for the treatment group are the same as earnings for the control group
using the bootstrap t percentile method, or the percentile method that we covered in class and TA
sessions. The alternative hypothesis is that they are not equal. (For the percentile t test, use the unequal
variance formulation.) Include those individuals with zero earnings in your test. Set the number of bootstrap
replications to 10,000. (6 points)
5. What feature of the data leads to the p-values across the three different tests differ so much (focus on the
difference between the t and z tests to the bootstrap test)? (2 points)
Take a moment to think and reflect about how many of the things you completed during this assignment were
known or even understandable just ten weeks ago. You have covered a lot of material. Congratulations!