Machine Learning for Causal Inference
(Due date: April 1, 1:00 pm)
Background
The data for this module come from Keller et al. (2024) on mathematics and vocabulary training sessions. You can find a description of the data here: https://osf.io/preprints/osf/2gur9_v1 This is the same dataset you used in previous problem sets, named WSCdata.Rdata. Please construct a dataset from a quasi-experiment where participants self-select their preferred treatment by selecting participants’ randomly assignment status, mathGrp, matches their self-selected treatment status, mathSel. You can use the following R code to do this:
library(tidyverse)
dat <- WSCdata %>% filter(mathGrp == mathSel)
Assignment:
From the lecture on ML for causal inference, we discussed several questions we can explore when studying heterogeneous treatment effects. These are:
a) Is there any effect heterogeneity at all?
b) Which covariates modify the treatment effect?
c) What is the expected treatment effect among subgroups with specific covariate values?
Your task is to apply each estimation method discussed in class to the WSCdata dataset, to address the three questions. The write-up should not exceed 3 pages, including plots and code.
1. Use a parametric, linear interaction model to answer questions (a), (b), and (c). Include plot(s) and/or formal test results to support your answers. Also, include your R code for fitting models and conducting formal tests. You don’t need to include code for plotting. (3 pts)
2. Use BART to answer questions (a), (b), and (c). Include plot(s) and/or formal test results. Include your R code. (3 pts)
3. Use causal forests to answer questions (a), (b), and (c). Include plot(s) and/or formal test results. Include your R code. (3 pts)
4. Are your findings similar or different across the three estimation methods? Which method do you prefer, and why? (1 pt)