首页 >
> 详细

Statistical learning

Department of Economics

Brock University

1 Assignment 2

1.1 Conceptual questions

1. Suppose that we wish to predict whether a given stock will issue a dividend this year

(“Yes” or “No”) based on X, last year’s percent profit.We examine a large number of

companies and discover that the mean value of X for companies that issued a dividend

was 10, while the mean for those that didn’t was 0. In addition, the variance of X for these

two sets of companies was 36. Finally, 80% of companies issued dividends. Assuming

that X follows a normal distribution, predict the probability that a company will issue a

dividend this year given that its percentage profit was X = 4 last year. Use equation (1)

from your notes on classification.

• This problem has to do with odds. On average, what fraction of people with an odds of

0.37 of defaulting on their credit card payment will in fact default?

1.2 Classification methods

This question should be answered using the Weekly data set, which is part of the ISLR package.

This data is similar in nature to the Smarket data except that it contains 1, 089 weekly returns

for 21 years, from the beginning of 1990 to the end of 2010.

1. Produce some numerical and graphical summaries of the Weekly data. Do there appear

to be any patterns? For the numerical summaries focus on the means of the returns

(today and all lags) as well as on the correlation between today’s returns and the lags.

For the graphical summaries create a plot of today’s return versus its first lag and discuss.

2. Use the full data set to perform a logistic regression with Direction as the response and

the five lag variables plus Volume as predictors. Use the summary function to print the

results. Do any of the predictors appear to be statistically significant? If so, which ones?

Compute the predicted probabilities and obtain the following features: min, max, mean.

Discuss those features.

3. Compute the confusion matrix and overall fraction of incorrect predictions. Explain

what the confusion matrix is telling you about the types of mistakes made by the logistic

regression.

4. Use the full data set to perform a LPM regression with Direction as the response and

the five lag variables plus Volume as predictors. Use the summary function to print

the results. Do any of the predictors appear to be statistically significant? If so, which

ones? Compute the predicted probabilities and obtain the following features: min, max,

mean. Discuss those features. Are the LPM probs sensible? Are they similar to those of

the logistic regression? Do you expect the confusion matrix to be similar to that of the

logistic regression?

5. Compute the confusion matrix and overall fraction of incorrect predictions for this LPM.

Is the matrix similar to the one obtained with the logistic regression?

6. Now fit the logistic regression model using a training data period from 1990 to 2008, with

Lag2 as the only predictor. Compute the confusion matrix and the overall fraction of

incorrect predictions for the held out data (that is, the data from 2009 and 2010).

7. Repeat (6) using LDA.

8. Repeat (6) using KNN with K = 1.

9. Which of these methods (logistic, LDA or KNN) appears to provide the best results on

this data? Why?

1.3 Cross-validation

In this question you will use the glm() and predict() functions, and a for loop to compute the

LOOCV error for a simple logistic regression model on the Weekly data set.

1. Fit a logistic regression model that predicts Direction using Lag1 and Lag2.

2. Fit a logistic regression model that predicts Direction using Lag1 and Lag2 using all but

the first observation.

3. Use the model from (2) to predict the direction of the first observation. You can do this by

predicting that the first observation will go up if P(Direction = ”U p”|Lag1, Lag2) > 0.5.

Was this observation correctly classified?

4. Write a for loop from i = 1 to i = n, where n is the number of observations in the data

set, that performs each of the following steps:

i. Fit a logistic regression model using all but the ith observation to predict Direction

using Lag1 and Lag2.

ii. Compute the posterior probability of the market moving up for the ith observation.

iii. Use the posterior probability for the ith observation in order to predict whether or

not the market moves up.

iv. Determine whether or not an error was made in predicting the direction for the ith

observation. If an error was made, then indicate this as a 1, and otherwise indicate

it as a 0.

5. Take the average of the n numbers obtained in (4)iv in order to obtain the LOOCV

estimate for the test error. Comment on the results.

Notes:

• Have a look at the Course Outline (on Sakai) for more info on how to create tables.

• The report must be typed.

• The report should have a titlepage, be single space and typed using a font of size 12.

• Your computer code and output should be included in the appendix.

• Pay attention to your graphs.

• Descriptive statistics, when applicable, should be reported in a table.

• Regression results should also be presented in a Table. The first column of your table

would contain the list of independent variables (starting with the constant). The remaining

columns would contain the results for the different models. The last few rows of the

table should contain: the sample size, and 2 measures of goodness of fit.

• When using a test statistic, report the null being testing, the formula for the test statistic

and how it was computed (eg using a regression and if so which regression). Make sure

to report a conclusion for that test (eg, I reject the null because XXXX and this implies

that XXXX).

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- 代写cs3014 Google Analytics Customer Rev 2020-01-21
- 代写cmpsc121 Structs代写留学生c/C++实验... 2020-01-21
- 代写mis6326 Data Management调试存储过程作业、数据库编 2020-01-21
- 代写msci 581作业、代做marketing Analytics作业、P 2020-01-20
- Software课程作业代做、代写java，C/C++程序设计作业、Pyth 2020-01-20
- Tcss 372作业代做、代写python，Java编程语言作业、代做c/C 2020-01-20
- Emergency Facilities作业代写、代写r编程设计作业、R课程 2020-01-18
- Cis 413/513作业代做、代写data Structures作业、Ja 2020-01-18
- 代写ia626留学生作业、Python程序设计作业调试、代做data课程作业 2020-01-18
- Mat00027i作业代写、Java程序语言作业调试、Mathematica 2020-01-17
- 代做kt Model作业、代写java，Python编程设计作业、代做c/C 2020-01-17
- Data Set课程作业代做、代写r程序语言作业、Ltcret留学生作业代做 2020-01-17
- 代写rstudio留学生作业、代做r编程设计作业、代写r课程设计作业代做数据 2020-01-17
- 代写cs2250 Delimiter Matching代做数据结... 2020-01-16
- 代写cs12b Edit Distance帮写java实验作业... 2020-01-16
- 代写mins325 Filereader And Filewriter代... 2020-01-16
- 代写cosi131 Tunnels帮写java实验作业 2020-01-16
- 代写inm312 Balancebit Software代写留学... 2020-01-16
- 代写cs61b Maze Solver代写java课程设计 2020-01-16
- Program留学生作业代做、C/C++编程语言作业代写、代做java，Py 2020-01-14