MS6221辅导、Modeling讲解、Java，Python/c++程序语言辅导讲解SPSS|讲解留学生Processing

MS6221 Predictive Modeling in Marketing
Individual Take-Home Final Project
Guidance
1. Please upload your pdf submission to Cavnas before May. 10th 11pm.
2. Please have your name, student number, and CityU email on top of the first page.
3. If you have any general question, please post it on Canvas discussion before May 8th 11pm.
4. If you have any personal concern, please also email me as early as possible.
Project Question
A large company that uses catalogs as part of its targeting strategy plans to send out a spring tabloid
featuring women’s clothes and shoes. Management needs to decide who among all potential customers in the
house file should receive a spring tabloid.
In order to make a mailing decision, the company has extracted a sub-sample of previous customers from
its database. The data are in file catalog_data.csv. A data description is given below. The data contains
information on whether the customer bought from the spring tabloid in the year before, and information on
some attributes of the customer at the time right before the spring tabloid was sent out last year.
As always, familiarize yourself with the data first before you start your statistical analysis.
0. Report Writing (30 points)
You are going to prepare a consulting report to the management, most of whom only expect to see marketing
anlaysis with figures and tables without any coding.
• In your report, please follow the question roadmap below.
• Please limit your report in 10 pages.
• There is no need to show us your code.
• You can use any software to prepare for the report. If you are really good in Excel, I don’t mind.
1
Variable Description
customer_no Customer id, can be linked to address
buytabw 1 = bought, 0 = did not buy from catalog
tabordrs Total orders from tabloids
divsords Total Orders with shoe division
divwords Total Orders with women’s division
spgtabord Total spring tabloid orders
tabordrs_year Orders from tabloids in the last year
divsords_year Orders with shoe division in the last year
divwords_year Orders with women’s division in the last year
tabordrs_quarter Orders from tabloids in the last quarter
divsords_quarter Orders with shoe division in the last quarter
divwords_quarter Orders with women’s division in the last quarter
moslsdvs Months since last shoe order
moslsdvw Months since last women’s order
moslstab Months since last tab order
orders Total orders
age Age of the customer
family_income Family Income of the customer
married 1 = married of the customer
fulltime_work Fulltime work status of the customer
family_size Family size of the customer
1. Estimation (20 points)
Randomly split the data into an estimation sample and a validation sample:
import numpy as np
import pandas as pd
import random
np.random.seed(2020)
catalog_DF = pd.read_csv('./catalog_data.csv')
L = catalog_DF.shape[0]
train_index = random.sample(range(0,L),10000)
train_index.sort()
DF_estimation = catalog_DF.loc[train_index,:]
DF_prediction = catalog_DF.drop(index=train_index)
(1) Using information from the estimation sample only, estimate a logistic regression model of the purchase
decision (buytabw), using all customer attributes in the data file (except customer_no) as independent
variables.
(2) Try the linear probability model, a.k.a., regression. The purchase decision (buytabw) is a binary
outcome. Using “regression” should restrict the outcome to [0, 1]. You can simply change all predicted
values below 0 to zero, and all predicted value above 1 to 1.
(3) Try recommend one machine learning method. Do it similar as the regerssion, change all predicted
values below 0 to zero, and all predicted value above 1 to 1.
For all analysis below, you should compare results for the logistic regression, linear probability model, and
your chosen method.
2
2. Predicted purchase probability in the validation sample (10 points)
Predict the purchase probability for all customers in the validation sample. Verify that the predicted purchase
probability variable was created and that it has reasonable values.
From now on, you should only work with observations in the validation sample.
3. Plot predicted purchase probabilities (10 points)
Try present useful plots of the predicted purchase probabilities, separately for customers who made a purchase
after receiving the catalog and those who did not respond:
Do your plots indicate that the model has some power to predict who is likely to purchase in the validation
sample?
4. Scoring and segmentation (10 points)
Score the customers and segment the customers into ten deciles, where score = 1 corresponds to the
customers with the lowest predicted purchase probabilities and score = 10 corresponds to the customers
with the highest predicted purchase probabilities. Employ the createBins function for this task.
# define function createBins --------------------------------------------
# Inputs: x, data
# N, the number of bins (groups) to create
def createBins(x,N):
cut = [i/N for i in range(1,N)]
df = pd.DataFrame(x)
cut_points = df.quantile(cut).T.values
cut_points = np.unique(cut_points)
cut_points = np.insert(cut_points,0,values=float('-inf'))
cut_points = np.append(cut_points,values=float('inf'))
labels = ['{0}'.format(i) for i in range(1,len(cut_points))]
bins = pd.cut(x, cut_points, labels=labels)
bins = pd.DataFrame(bins).astype(int)
return bins
Now create a summary data set, score_DF, that contains some key summary statistics separately for each
segment (score). Include these summary statistics:
• Number of observations in segment
• Number of buyers in segment
• Mean predicted purchase probability
• Mean observed purchase rate (based on buytabw)
5. Lift and gains (10 points)
Create a table indicating the lift, cumulative lift, and cumulative gains from the predictive model. Plot the
lift, cumulative lift, and cumulative gains chart.
3
Interpret and discuss the lifts and gains: Is the predictive model useful for targeting purposes?
6. Profitability analysis (20 points)
From now on work again with the customer-level data in catalog_DF. Use the following data:
• Based on past data, the average dollar margin per customer is $ 26.90
• The cost of printing and mailing one tabloid is $ 1.40
Using the predicted purchase probability, calculate expected profits. Try provide useful figures of the expected
profits variable and discuss.
• Calculate the fraction of customers who are expected to be profitable, i.e. have positive expected profits.
• Now rank customers according to their expected profitability. Then calculate realized profits, based on
the observed purchase decision of each customer.
• Calculate the cumulative sum of realized profits for a targeting strategy where customers are targeted
in descending order of expected profits.
• Plot the cumulative realized profits on the y-axis versus the percent of customers mailed on the x-axis.
Discuss your findings.
7. Recommended targeting strategy (20 points)
What mailing strategy do you recommend? Compare the actual profitability from your proposed strategy to
1. The expected profitability based on your model,
2. A mass mailing strategy where each customer receives a catalog.
What is the percent improvement in profits from your recommended strategy relative to a mass mailing
strategy?
4

MS6221辅导、Modeling讲解、Java，Python/c++程序语言辅导 讲解SPSS|讲解留学生Processing

MS6221辅导、Modeling讲解、Java，Python/c++程序语言辅导讲解SPSS|讲解留学生Processing