UA 323
Problem Set 1
Development Economics
Due: March 6
This problem set if based on the paper “The colonial origins of comparative development: an empirical investigation”, by Acemoglu, Johnson and Robinson (2001). This paper was a key reason why these three economists
won the 2024 Nobel Prize in economics. Read the abstract, introduction and conclusion. Answer the following questions:
1. What is the key question that the paper tries to answer? Not the practical thing they actually do, but the Big Picture question. What makes it hard to answer this question?
2. To answer the Big Picture question the paper use an Instrumental Variable (IV) approach. Describe the choice of instrumental variables AJR (2001) use to get around the challenges you described in question 1. Why are these instruments relevant (i.e. correlated with the endogenous variable: institutions)?
3. As you know, for zto be a valid instrument, it must be excluded: it must only be related to y through its effect on x. What does this assumption mean in the context of this paper? Can it be checked or validated? What do the authors do to convince you that they have a good instrument? For reasons that will become obvious, please answer this question before moving on with the problem set.
On Brightspace, you can find an Excel file called Institutions with some of the data used by AJR (2001) in their analysis, as well as an accompanying file with the variable labels.
4. Upload your Excel dataset in R. You can do it using the read_excel. So, if you had to upload a dataset called Dataset.xlsx in R you would use the following code:
Institutions <- read_excel(Dataset .xlsx)
As usual, you should adjust the code accordingly. Once you have uploaded the dataset, produce a scatter plot like the one in Figure 1 of AJR (2001), with the logarithm of GDP per capita in 1995 on they axis, and the logarithm of settler mortality on the x axis. To do so, you can use ggplot. For example, if my y axis variable was called pears and my x axis variable was called apples, I would use the following code:
ggplot(Institutions, aes(x = apples, y = pears)) + geom_point()
To get full marks, try to make the graph as close as possible to AJR (2001) – for example, by renaming the axis and labelling the dots with the countries’ codes.
Are GDP per capita and settlers’ mortality positively or negatively correlated? Is this relationship we were expecting to see?
5. Create a variable (call it above_median) that indicates above-median settler mortality. You should drop the countries for which settler mortality values are missing. What is the mean of average protection of expropriation risk for countries with above and below median settler mortality? What is the authors' key theory for why this is happening? Can you point to the table in the papers where the authors formally test
(via regression analysis) that these two variables are correlated?
6. What is the mean log per capita GDP for countries with above and below median settler mortality?
7. Given the numbers you’ve calculated, *if you were to believe the exclusion restriction* then what is the effect of expropriation on log GDP per capita? Hint: you can use the Wald estimator we saw in class.
8. The paper is old, and therefore usesold data. Let's see if it holds up with newer data.
8a. From problem set 1, you have contemporaneous GDP per capita. Upload the dataset, create a new variable for the logarithm of GDP per capita and only keep the mostrecent year. Rename the variable with the 3-letter country code countrycode. You can do it using colnames.
Merge current GDP per capita into the Institutions dataset using the variable countrycode. What R does is to take the log GDP per capita variable you just created in the pwt dataset and assign it to the country that has the same code in the dataset Institutions. So, for example, if I wanted to merge two datasets called Data2 into the dataset Data1 (which in your case will be Institutions) using the variable region I would write:
Data1 <- inner_join(Data1, Data2, by = “region”)
8b. The World Bank has contemporaneous dataon governance indicators. I have done some data cleaning for you, so use the dataset called WgiDataseton Brightspace. The variable labels are described in a separate document. There are several indicators - which one do you think is the correct one to use (there is no right answer - I want you to justify what you would pick, though)? Upload the dataset in R and only keep the indicator you chose and the same year you kept for point 8a. Merge this dataset into Institutions. So, you should have a dataframe. with mortality, and current measures of gdp and governance.
9. What is the effect of settler mortality on your modern measures of (log) GDP per capita and governance? What is your estimated effect of expropriation on GDP per capita using this newer data? Can you point to the table in the papers where the authors obtain the effect via Instrumental Variable regression? Focus on the first column of this table. How do you interpret the coefficient? Is it statistically significant?
10. One critiqueof the paper, by Jeff Sachs, is that it fails the exclusion restriction. One of the reasons he focuses on is the fact that places with higher settler mortality still have higher malaria. Why would this violate the exclusion restriction?
Jeff Sachs’ critique also includes Acemoglu, Johnson, and Robinson's response to him. Can you summarize why, according to their response, if settler mortality were to be correlated with modern malaria, then this is not necessarily an exclusion restriction violation? This is a difficult question, which requires some thought.
11. Let's see if Jeff Sachs is right in the data. Downloaddata on malariaand merge it with the settler mortality
data. Notice that you will have to do some cleaning yourself (possibly, in R). In particular, the malaria dataset does not have country codes, but only country names (which you also have in the WgiDataset). Notice, however, that some of the countries’ names don’t coincide (for example, Egypt).
Do places with higher mortality have higher malaria today? Think carefully about how you measure malaria incidence. Use the variable “Population exposed to high endemicity” for your calculations.