首页 >
> 详细

POLI 175 Problem Set 1

Due 7:59AM Thursday April 16, 2020

Please turn your homework in by emailing your html and code files to

Bertrand () before the due time. Your homework will be

graded based on completeness, accuracy, and readability of code.

The point allocation in this problem set is given by:

Q1.1 Q1.2 Q1.3 Q1.4 Q1.5 Q1.6 Q1.7 Q1.8 Q1.9

5 5 10 5 5 5 15 5 5

Q2.1 Q2.2 Q2.3 Q2.4 Q2.5 Q2.6 Total Bonus

5 5 5 10 5 10 100 10

This assignment will analyze vote returns for California House elections

and vote choice in a presidential election.

Q1: 2006 California Congressional Election Re-

sults

Our goal in this exercise is to predict the proportion of votes that a Demo-

cratic candidate for a House seat wins in a “swing district”: one where the

support for Democratic and Republican candidates is about equal and the

incumbent is a Democrat.

1) Load the data set ca2006.csv, a slightly modified version of the 2006

House election return data from the PSCL library

- The data set contains the following variables:

district: California Congressional district

1

prop d: proportion of votes for the Democratic candidate

dem pres 2004: proportion of two-party presidential vote for

Democratic candidate in 2004 in Congressional district

dem pres 2000: proportion of two-party presidential vote for

Democratic candidate in 2000 in Congressional district

dem inc: An indicator equal to 1 if the Democrat is the in-

cumbent

contested: An indicator equal to 1 if the election is contested

2) Create a plot of the proportion of votes for the Democratic candidate

(prop d), against the proportion of the two-party vote for the Demo-

cratic presidential candidate in 2004 (dem pres 2004) in the district.

Be sure to clearly label the axes and provide an informative title for

the plot

3) Regress the proportion of votes for the Democratic candidate, against

the proportion of the two-party vote for the Democratic presidential

candidate in 2004 in the district. Print the results and add the bivariate

regression line to the plot.

4) Using the bivariate regression and a function you have written yourself

(not the predict() function!), report the predicted vote share for

the Democratic candidate if dem pres 2004 = 0.5

5) Now, regress prop d against: dem pres 2004, dem pres 2000, and dem inc.

6) Using the multivariate regression from 5) and a function you have writ-

ten yourself, report the predicted vote share for the Democratic candi-

date if:

dem pres 2004 = 0.5

dem pres 2000 = 0.5

dem inc = 1

7) We are often interested in characterizing the uncertainty in our es-

timates. Throughout this class we will often use the bootstrap to

provide uncertainty for the estimates. Here, we will walk through the

steps to implement the bootstrap to characterize the uncertainty for

2

our response variable predictions.

Do the following 10000 times (in a for loop):

a) Using sample, randomly select 53 rows, the number of districts in

California in 2006, with replacement.

b) Using the randomly selected (“bootstrapped”) data set, fit the

bivariate and multivariate regressions specified earlier.

c) Using the fitted regressions, predict the expected vote share for

the Democratic candidate for each regression, using the values and

functions from 4) and 6).

d) Store the predictions from both regressions.

8) Report 95% Confidence Intervals for both predictions. In addition,

create histograms for both predictions.

9) We will say the model predicts that the Democrat wins if the predicted

vote share is greater than 50%. Based on the results of the bootstrap,

what proportion of time does each model predict the Democrat will

win?

Q2: Predicting Support for Bill Clinton in 1992

This problem will use a data set (again, modified from the PSCL package)

to predict whether a voter will vote for Bill Clinton. The data comes from

self-reported voting behavior in the 1992 Presidential election

1) Load the data set vote92.csv. It contains

clintonvote: an indicator equal to 1 if the voter supports Clinton

and 0 otherwise

dem: an indicator equal to 1 if the voter is a Democrat

female: an indicator equal to 1 if the voter is a woman

clintondist: a measure of the candidate’s self assessed ideologi-

cal distance from Clinton

2) What proportion of respondents report voting for Bill Clinton?

3

3) Using a logistic regression, regress clintonvote on dem, female, and

clintondist

4) Write a function to predict the probability that a voter supports Clinton

based on a logistic regression.

5) Using your function from 4) report the probability a female, Democrat,

with clintondist = 1 votes for Clinton.

6) Now use a linear regression to predict clintonvote as a function of

dem, female, and clintondist. For all voters (rows) in the data,

use the fitted linear regression to compute their predicted probabilities

of voting for Clinton. Do the same for the logistic regression. Plot

the predicted probabilities from the logistic regression (on the x-axis)

against those from the linear regression (on the y-axis).

Bonus) For this question, we’re going to use the predicted probabilities for

all voters from logistic regression, and we’re going to visualize how well

they perform using a calibration plot. We will construct the calibration

plot “from scratch” (i.e. without using any specialized libraries).

To do this, we will construct 10 bins of data, where each bin corresponds

to an interval of width 0.1, starting with the bin [0.0, 0.1). This first

bin corresponds to all data points with a predicted probability greater

than or equal to 0 AND less than 0.1. The next bin is [0.1, 0.2), and

so on.

For each bin, compute (a) the mean predicted probability in that bin

and (b) the actual proportion of positives (proportion of data points

whose true response variable value is 1) in that bin. For each bin, plot

(a) on the x-axis and (b) on the y-axis, creating a plot with 10 points on

it. Connect the points with a line. In addition, add a dashed “identity

line” (the y = x line) to the plot.

The closeness with which the plotted points trace along the identity

line is a rough visualization of how well the predicted probabilities are

“calibrated.”

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-21:00
- 微信：codinghelp2

- 辅导comp30027帮做python编程 2021-08-02
- 辅导csse2002/7023-Assignment 1辅导留学... 2021-08-02
- 辅导rush2辅导c/C++ 2021-08-02
- 辅导r语言编程|辅导spss|辅导web开发|辅导... 2021-05-10
- Data留学生编程辅导、辅导analysis程序、Sql语言程序调试辅导r语 2021-05-10
- 辅导31748程序语言、辅导programming编程设计、Java，Pyt 2021-05-10
- 辅导cis 657编程、辅导c/C++程序、C++编程调试帮做haskell 2021-05-10
- Com1005程序辅导、辅导java编程语言、辅导java程序辅导留学生pr 2021-05-10
- 辅导sit283程序、辅导c/C++，Python编程设计、Cs，Java程 2021-05-09
- C++程序辅导、辅导c++程序、辅导program编程语言辅导r语言编程|辅 2021-05-09
- 辅导0ccs0cse编程、辅导r，Java，Python程序语言辅导web开 2021-05-09
- Comp124编程语言辅导、Java程序辅导、辅导program语言编程辅导 2021-05-09
- Comp122编程语言辅导、辅导java程序语言、Java程序调试帮做has 2021-05-09
- 辅导ele00041i 调试java Programming 2021-05-08
- 辅导econ 2014-Assignment 1 Managerial... 2021-05-08
- 辅导mast90044-Assignment 1 Thinking An... 2021-05-08
- 辅导cs310-Assignment 2 Hash Tables 2021-05-08
- 辅导5pm 调试java编程、Java编程辅导 2021-05-08
- 辅导cs544 Final Exam Preparation Guide... 2021-05-08
- 辅导infs7450 Social Media Analytics 2021-05-08