#
MS6221课程作业代写、Modeling作业代做、Java，Python/c++程序语言作业代写
代做SPSS|代做留学生Processing

MS6221 Predictive Modeling in Marketing

Individual Take-Home Final Project

Guidance

1. Please upload your pdf submission to Cavnas before May. 10th 11pm.

2. Please have your name, student number, and CityU email on top of the first page.

3. If you have any general question, please post it on Canvas discussion before May 8th 11pm.

4. If you have any personal concern, please also email me as early as possible.

Project Question

A large company that uses catalogs as part of its targeting strategy plans to send out a spring tabloid

featuring women’s clothes and shoes. Management needs to decide who among all potential customers in the

house file should receive a spring tabloid.

In order to make a mailing decision, the company has extracted a sub-sample of previous customers from

its database. The data are in file catalog_data.csv. A data description is given below. The data contains

information on whether the customer bought from the spring tabloid in the year before, and information on

some attributes of the customer at the time right before the spring tabloid was sent out last year.

As always, familiarize yourself with the data first before you start your statistical analysis.

0. Report Writing (30 points)

You are going to prepare a consulting report to the management, most of whom only expect to see marketing

anlaysis with figures and tables without any coding.

• In your report, please follow the question roadmap below.

• Please limit your report in 10 pages.

• There is no need to show us your code.

• You can use any software to prepare for the report. If you are really good in Excel, I don’t mind.

1

Variable Description

customer_no Customer id, can be linked to address

buytabw 1 = bought, 0 = did not buy from catalog

tabordrs Total orders from tabloids

divsords Total Orders with shoe division

divwords Total Orders with women’s division

spgtabord Total spring tabloid orders

tabordrs_year Orders from tabloids in the last year

divsords_year Orders with shoe division in the last year

divwords_year Orders with women’s division in the last year

tabordrs_quarter Orders from tabloids in the last quarter

divsords_quarter Orders with shoe division in the last quarter

divwords_quarter Orders with women’s division in the last quarter

moslsdvs Months since last shoe order

moslsdvw Months since last women’s order

moslstab Months since last tab order

orders Total orders

age Age of the customer

family_income Family Income of the customer

married 1 = married of the customer

fulltime_work Fulltime work status of the customer

family_size Family size of the customer

1. Estimation (20 points)

Randomly split the data into an estimation sample and a validation sample:

import numpy as np

import pandas as pd

import random

np.random.seed(2020)

catalog_DF = pd.read_csv('./catalog_data.csv')

L = catalog_DF.shape[0]

train_index = random.sample(range(0,L),10000)

train_index.sort()

DF_estimation = catalog_DF.loc[train_index,:]

DF_prediction = catalog_DF.drop(index=train_index)

(1) Using information from the estimation sample only, estimate a logistic regression model of the purchase

decision (buytabw), using all customer attributes in the data file (except customer_no) as independent

variables.

(2) Try the linear probability model, a.k.a., regression. The purchase decision (buytabw) is a binary

outcome. Using “regression” should restrict the outcome to [0, 1]. You can simply change all predicted

values below 0 to zero, and all predicted value above 1 to 1.

(3) Try recommend one machine learning method. Do it similar as the regerssion, change all predicted

values below 0 to zero, and all predicted value above 1 to 1.

For all analysis below, you should compare results for the logistic regression, linear probability model, and

your chosen method.

2

2. Predicted purchase probability in the validation sample (10 points)

Predict the purchase probability for all customers in the validation sample. Verify that the predicted purchase

probability variable was created and that it has reasonable values.

From now on, you should only work with observations in the validation sample.

3. Plot predicted purchase probabilities (10 points)

Try present useful plots of the predicted purchase probabilities, separately for customers who made a purchase

after receiving the catalog and those who did not respond:

Do your plots indicate that the model has some power to predict who is likely to purchase in the validation

sample?

4. Scoring and segmentation (10 points)

Score the customers and segment the customers into ten deciles, where score = 1 corresponds to the

customers with the lowest predicted purchase probabilities and score = 10 corresponds to the customers

with the highest predicted purchase probabilities. Employ the createBins function for this task.

# define function createBins --------------------------------------------

# Inputs: x, data

# N, the number of bins (groups) to create

def createBins(x,N):

cut = [i/N for i in range(1,N)]

df = pd.DataFrame(x)

cut_points = df.quantile(cut).T.values

cut_points = np.unique(cut_points)

cut_points = np.insert(cut_points,0,values=float('-inf'))

cut_points = np.append(cut_points,values=float('inf'))

labels = ['{0}'.format(i) for i in range(1,len(cut_points))]

bins = pd.cut(x, cut_points, labels=labels)

bins = pd.DataFrame(bins).astype(int)

return bins

Now create a summary data set, score_DF, that contains some key summary statistics separately for each

segment (score). Include these summary statistics:

• Number of observations in segment

• Number of buyers in segment

• Mean predicted purchase probability

• Mean observed purchase rate (based on buytabw)

5. Lift and gains (10 points)

Create a table indicating the lift, cumulative lift, and cumulative gains from the predictive model. Plot the

lift, cumulative lift, and cumulative gains chart.

3

Interpret and discuss the lifts and gains: Is the predictive model useful for targeting purposes?

6. Profitability analysis (20 points)

From now on work again with the customer-level data in catalog_DF. Use the following data:

• Based on past data, the average dollar margin per customer is $ 26.90

• The cost of printing and mailing one tabloid is $ 1.40

Using the predicted purchase probability, calculate expected profits. Try provide useful figures of the expected

profits variable and discuss.

• Calculate the fraction of customers who are expected to be profitable, i.e. have positive expected profits.

• Now rank customers according to their expected profitability. Then calculate realized profits, based on

the observed purchase decision of each customer.

• Calculate the cumulative sum of realized profits for a targeting strategy where customers are targeted

in descending order of expected profits.

• Plot the cumulative realized profits on the y-axis versus the percent of customers mailed on the x-axis.

Discuss your findings.

7. Recommended targeting strategy (20 points)

What mailing strategy do you recommend? Compare the actual profitability from your proposed strategy to

1. The expected profitability based on your model,

2. A mass mailing strategy where each customer receives a catalog.

What is the percent improvement in profits from your recommended strategy relative to a mass mailing

strategy?

4