首页 >
> 详细

Stat 462 - Individual Project 2

Due May 10, 2020

Regression Analysis

This project is to be completed individually. You may submit pdf only (Rmd is not needed and you can use

another word processing tool if you like).

We will use the Ames Housing dataset, which has 82 variables and 2930 observations (AmesHousing.txt).

The 82 features include 23 nominal, 23 ordinal, 14 discrete, and 20 continuous variables. You may just use 20

continuous variables in this project.

Your goal is to predict the sale price. Exclude the Order, PID, and of course SalesPrice variables

from your predictors. You may want to combine variables (e.g. summing square feet) and perform various

manipulations (e.g. transformations for nonlinearity) we have learned about.

You should try at least multiple linear regression (with and without transformations) and weighted least

squares (or generalized least squares), but you are welcome to explore more! You need to preprocess the

dataset. After the preprocessing, use the following code to define your test set, and its complement the

training set.

set.seed(2020)

testindices = sample(2930, round(2930/4)) ## train indices are the rest

Write up your results in a professional report, like you would present to a client or internal customer for

your analysis. The report should be no more than 4 double-spaced pages long and submitted in

PDF format. You should put important tables/figures in the report and put additional tables/figures in

the appendix.

It should include an appropriate analysis of the performance of the models you consder, and the reasons

for your final choice of model(s). Include any other details from your analysis that you feel are worthy of

mention.

The report should have four sections (Introduction, Analysis, Results, Conclusion) and provide sufficient

details that anyone with a reasonable statistics background could understand exactly what you have done and

what you concluded. In the introduction part, you should present some background and motivation to analyze

the housing data. In the analysis part, you should outline the analysis and some necessary methodological

details. In the result part, you should use tables/figures to summarize your results and explain your findings.

In the conclusion part, you should connect your analysis and results to your motivation and discuss some

possible future work. Do not embed R code in the body of your report (if you are using rmarkdown, use

{r echo=FALSE} to supress the printing of the r code), but instead attach the code in an appendix. The

appendix does not count towards the page limit.

Grading criteria (out of 15)

10 points: fulfilling the project requirements. You may want to remove variables or observations with missing

values and combine variables (e.g. summing square feet)

5 points: the quality of your report (including: clarity of writing, organization, and layout; appropriate use of

tables and figures; careful proof-reading; adherence to report guidelines

Project Requirements

Requirement 1:

Preprocess the dataset (e.g., checking missing values, combining highly correlated features). Perform some

exploratory data analysis such as summary statistics, boxplot, correlation plot, and so on.

Requirement 2:

Fit the regression model on the training dataset. Perform the appropriate diagnostics for your regression

analysis (checking the regression assumption, influential observations, outliers, collinearity). You may need to

remove some collinear variables and/or outliers.

Requirement 3:

After removing the collinear variables and/or outliers, you may fit the following regression models: (i) the

full model, (ii) the sub model chosen by AIC, (iii) the sub model chosen by BIC, (iv) the sub model chosen

by Lasso, and (v) the sub model chosen by the Elastic Net. Summarize the fit of these models and compare

their coefficient estimates.

Requirement 4:

For each model, calculate the “mean prediction error” in the testing dateset.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Comp 250 Assignment 3 2020-05-24
- Macm 316 – Computing Assignment 7 2020-05-24
- Sta457 Assignment 2020-05-24
- Homework 10 2020-05-24
- Lab 2 Msc: Time Series Prediction With... 2020-05-24
- Comp2011作业代做、Data Analysis作业代写、C++编程语言 2020-05-24
- 代做compsys201作业、Python，Java，C/C++编程语言作业 2020-05-24
- Program留学生作业代做、Python编程设计作业调试、Data作业代写 2020-05-24
- 代写 Practical 3 Covid-19程序作业，代写... 2020-05-23
- 代写comp3059作业、代做programming作业、Java语言作业代 2020-05-23
- Coit12206作业代写、Program课程作业代做、Java、Pytho 2020-05-23
- Data2001作业代做、Data Science作业代做、Sql语言作业代 2020-05-23
- 代写comp2017作业、代写c/C++语言作业、代写data作业、C/C+ 2020-05-23
- Data留学生作业代做、Python编程设计作业调试、代写program课程 2020-05-22
- Mkan1-Uc 5103作业代写、代做analytics作业、Java，P 2020-05-22
- Pols 512作业代写、R编程设计作业调试、Data留学生作业代做、代写r 2020-05-21
- Econ 6070作业代做、Data课程作业代写、代做java，Python 2020-05-21
- Pstat 170 2020-05-20
- Comp 250 Assignment 3 2020-05-20
- Data留学生作业代做、代写r程序语言作业、代做r实验作业、代写progra 2020-05-20