首页 >
> 详细

STA 141A

Fall 2019

Homework 4

Due: December 5 (Thursday), 11.59 pm

Submit the assignment electronically through Canvas. Electronic submission

must be in the form of a zip folder (with extension .zip, .7z, etc.) containing

two files: (i) your answers (.pdf file); (ii) R codes used (.R file). Alternatively,

the assignment can be submitted in the form of a R Markdown file (.Rmd).

Honor Code: “The codes and results derived by using these codes constitute my

own work. I have consulted the following resources regarding this assignment:”

(ADD: names of persons or web resources, if any, excluding the instructor, TAs,

and materials posted on Canvas.)

Problem Statement: The goal is to compare k-means clustering and hierarchical clustering methods,

in a real-data clustering problem.

1. The data-set customers_data.csv contains eight variables measured on 440 instances:

(a) CHANNEL:customer’s Channel - Horeca (Hotel/Restaurant/Cafe’) or Retail channel;

(b) REGION: customer’s Region - Lisbon, Oporto or Other.

(c) FRESH: annual spending (in US dollars) on fresh products;

(d) MILK: annual spending (in US dollars) on milk products;

(e) GROCERY: annual spending (in US dollars) on grocery products;

(f) FROZEN: annual spending (in US dollars) on frozen products;

(g) DETERGENTS_PAPER: annual spending (in US dollars) on detergents and paper products;

(h) DELICATESSEN: annual spending (in US dollars) on delicatessen products.

• Import customers_data.csv as a data frame with header and make a summary

of its variables. (5 points)

• Extract the variables FRESH and FROZEN and store them in a separate data frame,

called customers_2. (5 points)

• From this new data frame, provide a scatter-plot matrix of FRESH and FROZEN via the

function ggpairs from the package GGally. (5 points)

2. Estimate a k-means clustering partition on FRESH and FROZEN. For k = 1, . . . , 10, run 100

times the following procedure: (5 points)

• draw randomly the 80% of observations in customer_2 and use them as a training

data-set. The remaining observations will constitute the test data-set; (5 points)

• run the function kmeans on the training data-set with k centers, 20 random starts and

100 maximum iterations; (5 points)

• use the estimated centers to allocate the observations of the test data-set to a specific

group and derive the relative vector of assignments; (10 points)

• calculate the deviance within estimated groups in the test data-set. (10 points)

Then, for each k, average the deviance within groups over the 100 runs. (5 points)

Finally:

• plot this average over the number of clusters and decide the optimal number of clusters

using the elbow criterion; (5 points)

• re-apply kmeans with the selected number of clusters, 20 random starts and 100 maximum

iterations, and derive the estimated cluster memberships; (5 points)

• provide a scatter-plot matrix of FRESH and FROZEN conditional on the estimated cluster

memberships via the function ggpairs from the package GGally and comment

about the shape of FRESH and FROZEN over groups. (5+5 points)

3. • Estimate a hierachical partition on FRESH and FROZEN by the complete linkage using

the number of clusters selected above. (5 points)

• Derive the estimated cluster memberships. (5 points)

• Provide a scatter-plot matrix of FRESH and FROZEN conditional on the estimated cluster

memberships via the function ggpairs from the package GGally.(5 points)

• Comment about the shape of FRESH and FROZEN over the estimated groups and compare

this outcome to the k-means one. (5+5 points)

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Tsp课程作业代写、代做algorithms留学生作业、代做java，C/C 2020-06-23
- Kit107留学生作业代做、C++编程语言作业调试、Data课程作业代写、代 2020-06-23
- Sta302h1f作业代做、代写r课程设计作业、代写r编程语言作业、代做da 2020-06-22
- 代写seng 474作业、代做data Mining作业、Python，Ja 2020-06-22
- Cmpsci 187 Binary Search Trees 2020-06-21
- Comp226 Assignment 2: Strategy 2020-06-21
- Math 504 Homework 12 2020-06-21
- Math4007 Assessed Coursework 2 2020-06-21
- Optimization In Machine Learning Assig... 2020-06-21
- Homework 1 – Math 104B 2020-06-20
- Comp1000 Unix And C Programming 2020-06-20
- General Specifications Use Python In T... 2020-06-20
- Comp-206 Mini Assignment 6 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Mech 203 – End-Of-Semester Project 2020-06-20
- Ms980 Business Analytics 2020-06-20
- Cs952 Database And Web Systems Develop... 2020-06-20
- Homework 4 Using Data From The China H... 2020-06-20
- Assignment 1 Build A Shopping Cart 2020-06-20