首页 >
> 详细

Retention Predictive Modeling Project Part 2

Due: 12/3/2018 at the Data Mining World Championships

In this part of the project you will be partitioning your project data and building the best predictive model possible. Recall the purpose of this model is to flag potential applicants at high risk for not returning to Miami after their first year at the time of application to Miami.

The in class competition will be whose model can identify the most 1’s (being not retained) in the 2018 data out of the top 100 predicted probabilities as identified by your model.

Data Partitioning

The first decision you will have to make is how to partition the data. Recall, you are predicting over time so you might want to consider how the year should be included.

Should the partition be stratified by year (i.e. some of every year in the training data)?

Should you save a holdout year and use that as your validation data?

Use at least 70% for the training dataset and keep 30% or less for the validation or holdout. Make sure you have enough data in the training dataset considering we will be using cross validation.

Oversampling

You have the option to create an oversampled training data set as we did in the last homework. If you choose to use this option please be sure that you have enough data in training data set. This is not required, but could improve model performance.

Please leave the validation set in the original proportions so that you do not have to make any adjustments when comparing models.

Modeling

You are to assign the “event” as the probability that someone leaves Miami. All models and results must be constructed in this manner.

I would like for you to build at least the following models:

1.A logistic regression model.

2.A random forest model.

3.A boosted tree model.

4.A neural network model.

ISA 591 students must include at least one model we did not cover in the course.

This is the minimum number of models to attempt (i.e. building only four models will not get you a perfect grade). You can easily add models by using different sets of predictors or settings.

I will be grading you on the proper model settings not model performance.

1.Did you use the proper type of cross validation or validation?

2.Did you perform model selection?

3.Did you use recommended settings?

You will be required to provide evidence why you chose the model you did. It would be best to use multiple measures like lift, ROC and complexity.

On the day of the competition you will only evaluate the model you chose to be the best using the new 2016 data. We will only model the Domestic students. The data will be in a similar format as the original domestic.csv you downloaded. You can use your code to update any variables necessary.

Deliverables

1.A business memo, written for anyone in Miami’s administration including

a.A summary of the predictive accuracy of your chosen model (think about interpreting the lift of your model vs. a random sample)

b.The variables that need to be included in the model.

c.The most important variable to predicted retention.

d.Provide a cutoff (predicted probability) at which Miami should set their systems to flag a student as at risk for not returning after their first year.

e.Provide information on how the model will perform in practice. In other words, what percent of students will your model flag as “not retained”?

f.Provide a discussion of variables that you think that should be collected that might aid in improving this model in light of the problem (so no college performance data).

Your memo must have correctly labeled tables and figures which are correctly referred to in the document. Your memo should not tell the “story” of what you did. It should only discuss the outcome, i.e. your final model. I will be more stringent on the grading of this in this part of the project.

2.An appendix describing (this can be more and created from R markdown)

a.How and why the data was partitioned.

b.The type of model used and settings so that someone can re-create it.

c.Summary of the models performance (Lift, ROC, Misclassification, etc.) and why you chose this model. Use graphs or constructed tables, no R output.

3.Upload your model code to canvas.

4.Upload your project write-up to canvas.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Csci 3120作业代做、C++程序语言作业调试、代做c/C++课程作业、 2020-05-26
- 代写algorithms作业、Data留学生作业代做、代写java、Pyth 2020-05-26
- Data Science作业代写、C++程序设计作业代写、Programmi 2020-05-26
- Data课程作业代写、C++编程设计作业调试、C/C++语言作业代做、Alg 2020-05-26
- 代写r留学生作业、代做data课程作业、代写r编程语言作业代做r语言编程|调 2020-05-25
- Cosc473作业代做、Systems作业代写、Python编程设计作业调试 2020-05-25
- Data留学生作业代做、R编程设计作业调试、R语言作业代写、Program课 2020-05-25
- Comp 250 Assignment 3 2020-05-24
- Macm 316 – Computing Assignment 7 2020-05-24
- Sta457 Assignment 2020-05-24
- Homework 10 2020-05-24
- Lab 2 Msc: Time Series Prediction With... 2020-05-24
- Comp2011作业代做、Data Analysis作业代写、C++编程语言 2020-05-24
- 代做compsys201作业、Python，Java，C/C++编程语言作业 2020-05-24
- Program留学生作业代做、Python编程设计作业调试、Data作业代写 2020-05-24
- 代写 Practical 3 Covid-19程序作业，代写... 2020-05-23
- 代写comp3059作业、代做programming作业、Java语言作业代 2020-05-23
- Coit12206作业代写、Program课程作业代做、Java、Pytho 2020-05-23
- Data2001作业代做、Data Science作业代做、Sql语言作业代 2020-05-23
- 代写comp2017作业、代写c/C++语言作业、代写data作业、C/C+ 2020-05-23