首页 >
> 详细

Introduction to Data Analysis

Final Project [30 points]

The final project is worth 30% of the grade.

The final project utilizes the machine learning technique of classification to predict an outcome for a

banking marketing problem. A bank is planning a telemarketing campaign to increase the number of

term deposits it has. The data records include the output target (whether they responded positively: yes

or no), and several candidate features. Your task is to use the data analysis techniques you have learned

in class to predict which customers are likely to respond positively to the campaign. You will use any two

techniques learned in class, compare the models from both the techniques and make a

recommendation. Based on your recommendation the bank will then use your chosen model to score

unseen data to target customers for the telemarketing campaign.

You will create a power point deck to report your findings and make a recommendation on which model

to choose and likely impact.

Data set and related information:

The dataset is available in the UCI machine learning repository:

https://archive.ics.uci.edu/ml/datasets/Bank+Marketing

Read through the dataset information, attribute information.

Note that we will only use the more recent versions of the datasets. That is, we will only use the

bankadditional.zip folder and files.

Also note that bank-additional-full.csv contains the full dataset of 41,118 records and bankadditional.csv

contains a 10% random sample of 4119 examples. It is recommended that you try most of your work on

the bank-additional.csv (the 10% random sample) and move to the full dataset only when you have got

your models working and are trying to improve the accuracy or other aspects.

The following is a checklist of the contents for each slide.

Slide 1 [5 points]

• Name of presenter

• Description of the problem

• How you would apply data analytics to the problem

• What are the likely impacts of applying data analytics

Slide 2 [5 points]

• The methodology you will use in tackling the problem

Slides 3-8 [5 points]

• Create some basic plots and graphs (histograms, boxplots, scatterplots) of the data

• Also compute some statistics of the features that you think are important

• Plot some scatter plots showing the classes in different colors

Slides 9-11 [5 points]

• Describe your first choice for model building

• Justify your choice. How is it meaningful or relevant for the business problem at hand?

• Describe your model

• Report on performance metrics of your model

Slides 12-14 [5 points]

• Describe your second choice for model building

• Justify your choice. How is it meaningful or relevant for the business problem at hand?

• Describe your model

• Report on performance metrics of your model

Slide 15 [5 points]

• Make a recommendation on which model should be selected among your two models

• State your conclusion based on this data analytics exercise

• State what are the possible business outcomes

Some tips you will find useful

1. Converting categorical variables to numeric variables: This stackoverflow page has some tips on how

to convert categorical variables to numeric variables:

https://stackoverflow.com/questions/32011359/convert-categorical-data-in-pandas-dataframe

2. Assessing model performance (In addition to the metrics that we have seen in class): ROC Curves

and AUC

This scikit-learn help page contains some hints on create ROC curves and computing Area under the

curve:

http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html#

You can learn more about the ROC curve and Area under the curve here:

https://en.wikipedia.org/wiki/Receiver_operating_characteristic

3. Plotting

You might find this page helpful in getting started with plotting:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.hist.html

You might also find the scikit-learn pages helpful:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.hist.html

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.boxplot.html

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.scatter.html

Final Term Project Rubric

Slides Exemplary

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- 代写cs3014 Google Analytics Customer Rev 2020-01-21
- 代写cmpsc121 Structs代写留学生c/C++实验... 2020-01-21
- 代写mis6326 Data Management调试存储过程作业、数据库编 2020-01-21
- 代写msci 581作业、代做marketing Analytics作业、P 2020-01-20
- Software课程作业代做、代写java，C/C++程序设计作业、Pyth 2020-01-20
- Tcss 372作业代做、代写python，Java编程语言作业、代做c/C 2020-01-20
- Emergency Facilities作业代写、代写r编程设计作业、R课程 2020-01-18
- Cis 413/513作业代做、代写data Structures作业、Ja 2020-01-18
- 代写ia626留学生作业、Python程序设计作业调试、代做data课程作业 2020-01-18
- Mat00027i作业代写、Java程序语言作业调试、Mathematica 2020-01-17
- 代做kt Model作业、代写java，Python编程设计作业、代做c/C 2020-01-17
- Data Set课程作业代做、代写r程序语言作业、Ltcret留学生作业代做 2020-01-17
- 代写rstudio留学生作业、代做r编程设计作业、代写r课程设计作业代做数据 2020-01-17
- 代写cs2250 Delimiter Matching代做数据结... 2020-01-16
- 代写cs12b Edit Distance帮写java实验作业... 2020-01-16
- 代写mins325 Filereader And Filewriter代... 2020-01-16
- 代写cosi131 Tunnels帮写java实验作业 2020-01-16
- 代写inm312 Balancebit Software代写留学... 2020-01-16
- 代写cs61b Maze Solver代写java课程设计 2020-01-16
- Program留学生作业代做、C/C++编程语言作业代写、代做java，Py 2020-01-14