Assignment #1
Due: Monday, Feb. 20, 11:59pm, 2023
Introduction
The assignment will be done in groups of up to 6 students. Please submit 1 single report, no
more than 20 pages, double spaced. Report should be concise. Additional tables and
figures can be included in the appendix but cannot be guaranteed that it will be read.
The assignment will use both Tableau and AET. The data file is available on Moodle. It is
divided into 6 main parts – exploratory and descriptive analysist, market basket analysis,
assortment planning, RFM & Clustering, Predictions, and Price Elasticity. Read the
assignment in advance. The time required on each part is not equal.
The assignment assumes basic knowledge of Tableau. There are a number of video tutorials
online to help you get started. You will also likely need to learn more advance functions like
how to join tables in Tableau. One tutorial here https://youtu.be/Xbg2WabUIKI .
Answer the following questions.
Part 1 – Exploratory data analysis
1. [10 points] Perform descriptive analytics of the orders. Specifically, plot the
distributions for number of orders, quantity and unit price over time. Note that you
will need to create a new variable for unit price which is the Sales divided by
Quantity. Describe what you see. Are there any purchasing patterns over time like
holidays?
2. [15 points] For a particular product, is there variance in unit price? Which product
has the biggest variance in unit price? How does the unit price vary over time? Can
you show a few examples of products which varying a lot over time? Which
products vary little over time? Which subcategory has the largest variance? Can you
explain why some of the products vary in unit price? Why do you think it happens?
Part 2 – market basket analysis
3. [25 points] Perform market basket analysis at the sub-category level. You do not
need to compute the support, lift and confidence. Rather, a simple quantity count is
sufficient. What sub categories are purchased the most together?
4. [25 points] Now, conduct a market basket analysis at the product level for the top 10
selling products. What do you find? Are these high or low priced products?
Part 3 – Assortment Planning
5. [25 points] For the bakery example done in class, what is the estimated demand for
Fudge, Family Size, Tiny Tina? What was the actual substitution effect with Fudge,
Family Size, Yummy Cake? Show your work.
Part 4 – RFM & Clustering
1. [5 points] Generate the RFM variables using the superstore data. Remember, you
will need to group by customer id first. Then, assign a score of 1 for the “best”
customer for the top 20 percentile of customers for that dimension. Show your and
procedure how you did this.
2. [20 points] Once you have created a worksheet with proper variables and
visualization, you can export the worksheet data using the “Crosstab to Excel” in the
“Worksheet” menu item. Make sure that the rows and columns in the exported files
are in the correct. Open the file in Excel to check that it is what you want. Then,
export the file to csv, and import into the AET clustering module.
a. Make plots for R&F, F&M, and M&R. Perform clustering on these plots. The
AET shows What do you choose for the number of clusters? How and why
did you come up with those values?
b. Now, we will cluster on all 3 RFM dimensions. Use Step 3 in the AET. How
many clusters did you choose? Why?
c. Now, repeat part (b) with more variables from Tableau. Which variables do
you add and why?
Part 5 - Prediction
3. [5 points] Now we will predict customers who are likely to return items. In Tableau,
join the order table with the returns table. Do a cross tabulation or “group by”
Customer ID, and count how many returns each customer does. Check that your
table join includes customers who had no returns. Now perform a “slice-and-dice”
by visualizing the number of returns by each group of R, F, M like in lecture for the
campaign response rates.
4. [20 points] Now export your worksheet data like in part 4, into excel, then into csv.
Upload the csv file into the Decision Tree module of the AET. Note that for decision
tree, the value can only be 0 or 1. You will need to convert your returns count into 0
if they have never returned an item, and 1 if they have returned 1 or more. In the
Decision Tree module, select your new returns binary variable as the “target”, and
one more more for the features. Plot the decision tree1. What do you see? Explain
the outcomes. What other variables can you add?
1 For explanation of the results and plots, see “Results Explained” below. Note that the AET reports the
“value” as a proportion, not a count as mentioned in the website:
https://www.w3schools.com/python/python_ml_decision_tree.asp
5. [25 points] Now we will try to use the Regression Module to predict the returns. You
can use the total returns count. What independent variables do you use? Which
ones are statistically significant? What other variables can you add?
Part 6 – Price Elasticity
6. [25 points] For this part, we will estimate the price elasticity of each sub-category
from the superstore dataset. Prepare your data by creating separate csv files for
each sub-categories, and each row should be aggregated by product. Product the
average unit price for each product, and total units (quantity) sold. Remember to
also produce the log of these values. You will need to do regression analysis. There
are several options to do so, and you can choose which you are most familiar with:
a. Tableau. You can use the Analytics and trend line function. A quick tutorial
https://youtu.be/lc76P-C205o
b. AET. Import the csv into the Regression Module of the AET and compute the
regression on the log values.
c. R. If you are familiar with R, you can use
What are the price elasticities for each of the sub-categories? Can you explain what
you see? Does it make sense?