辅导POLI 175解析回归、存储过程调试

POLI 175 Problem Set 1
Due 7:59AM Thursday April 16, 2020
Please turn your homework in by emailing your html and code files to
Bertrand () before the due time. Your homework will be
graded based on completeness, accuracy, and readability of code.
The point allocation in this problem set is given by:
Q1.1 Q1.2 Q1.3 Q1.4 Q1.5 Q1.6 Q1.7 Q1.8 Q1.9
5 5 10 5 5 5 15 5 5
Q2.1 Q2.2 Q2.3 Q2.4 Q2.5 Q2.6 Total Bonus
5 5 5 10 5 10 100 10
This assignment will analyze vote returns for California House elections
and vote choice in a presidential election.
Q1: 2006 California Congressional Election Re-
sults
Our goal in this exercise is to predict the proportion of votes that a Demo-
cratic candidate for a House seat wins in a “swing district”: one where the
support for Democratic and Republican candidates is about equal and the
incumbent is a Democrat.
1) Load the data set ca2006.csv, a slightly modified version of the 2006
House election return data from the PSCL library
- The data set contains the following variables:
district: California Congressional district
1
prop d: proportion of votes for the Democratic candidate
dem pres 2004: proportion of two-party presidential vote for
Democratic candidate in 2004 in Congressional district
dem pres 2000: proportion of two-party presidential vote for
Democratic candidate in 2000 in Congressional district
dem inc: An indicator equal to 1 if the Democrat is the in-
cumbent
contested: An indicator equal to 1 if the election is contested
2) Create a plot of the proportion of votes for the Democratic candidate
(prop d), against the proportion of the two-party vote for the Demo-
cratic presidential candidate in 2004 (dem pres 2004) in the district.
Be sure to clearly label the axes and provide an informative title for
the plot
3) Regress the proportion of votes for the Democratic candidate, against
the proportion of the two-party vote for the Democratic presidential
candidate in 2004 in the district. Print the results and add the bivariate
regression line to the plot.
4) Using the bivariate regression and a function you have written yourself
(not the predict() function!), report the predicted vote share for
the Democratic candidate if dem pres 2004 = 0.5
5) Now, regress prop d against: dem pres 2004, dem pres 2000, and dem inc.
6) Using the multivariate regression from 5) and a function you have writ-
ten yourself, report the predicted vote share for the Democratic candi-
date if:
dem pres 2004 = 0.5
dem pres 2000 = 0.5
dem inc = 1
7) We are often interested in characterizing the uncertainty in our es-
timates. Throughout this class we will often use the bootstrap to
provide uncertainty for the estimates. Here, we will walk through the
steps to implement the bootstrap to characterize the uncertainty for
2
our response variable predictions.
Do the following 10000 times (in a for loop):
a) Using sample, randomly select 53 rows, the number of districts in
California in 2006, with replacement.
b) Using the randomly selected (“bootstrapped”) data set, fit the
bivariate and multivariate regressions specified earlier.
c) Using the fitted regressions, predict the expected vote share for
the Democratic candidate for each regression, using the values and
functions from 4) and 6).
d) Store the predictions from both regressions.
8) Report 95% Confidence Intervals for both predictions. In addition,
create histograms for both predictions.
9) We will say the model predicts that the Democrat wins if the predicted
vote share is greater than 50%. Based on the results of the bootstrap,
what proportion of time does each model predict the Democrat will
win?
Q2: Predicting Support for Bill Clinton in 1992
This problem will use a data set (again, modified from the PSCL package)
to predict whether a voter will vote for Bill Clinton. The data comes from
self-reported voting behavior in the 1992 Presidential election
1) Load the data set vote92.csv. It contains
clintonvote: an indicator equal to 1 if the voter supports Clinton
and 0 otherwise
dem: an indicator equal to 1 if the voter is a Democrat
female: an indicator equal to 1 if the voter is a woman
clintondist: a measure of the candidate’s self assessed ideologi-
cal distance from Clinton
2) What proportion of respondents report voting for Bill Clinton?
3
3) Using a logistic regression, regress clintonvote on dem, female, and
clintondist
4) Write a function to predict the probability that a voter supports Clinton
based on a logistic regression.
5) Using your function from 4) report the probability a female, Democrat,
with clintondist = 1 votes for Clinton.
6) Now use a linear regression to predict clintonvote as a function of
dem, female, and clintondist. For all voters (rows) in the data,
use the fitted linear regression to compute their predicted probabilities
of voting for Clinton. Do the same for the logistic regression. Plot
the predicted probabilities from the logistic regression (on the x-axis)
against those from the linear regression (on the y-axis).
Bonus) For this question, we’re going to use the predicted probabilities for
all voters from logistic regression, and we’re going to visualize how well
they perform using a calibration plot. We will construct the calibration
plot “from scratch” (i.e. without using any specialized libraries).
To do this, we will construct 10 bins of data, where each bin corresponds
to an interval of width 0.1, starting with the bin [0.0, 0.1). This first
bin corresponds to all data points with a predicted probability greater
than or equal to 0 AND less than 0.1. The next bin is [0.1, 0.2), and
so on.
For each bin, compute (a) the mean predicted probability in that bin
and (b) the actual proportion of positives (proportion of data points
whose true response variable value is 1) in that bin. For each bin, plot
(a) on the x-axis and (b) on the y-axis, creating a plot with 10 points on
it. Connect the points with a line. In addition, add a dashed “identity
line” (the y = x line) to the plot.
The closeness with which the plotted points trace along the identity
line is a rough visualization of how well the predicted probabilities are
“calibrated.”