辅导 Homework 1 Part I: Basic Concepts调试R语言程序

Homework 1

Part I: Basic Concepts

1. There are two vectors, x1 and x2

What is the distance between x1 and x2 ? (Please show the steps of the calculations)

(1) if the distance measure is based on L2 norm (a.k.a Euclidean norm)

(2) if the distance measure is based on L1 norm

(3) if the distance measure is based on L∞ norm (a.k.ainfinity norm)

In class, we studied the customer segmentation example and tried to find the most valuable customers who have good income but low spend. There are two feature components in this application.

Assume that we use a clustering method similar to k-mean, and this method could use any type of vector norms as distance measure. Then, does the L∞ norm-based distance measure make sense for this application?

2. We define a scalar valued function of a vector variable

Here,x is a column vector, xT is the transpose of x, and A is asymmetric matrix

To simplify this question, let's assume x has only two elements and

The derivative of f with respect to x is a vector defined by

Show that dx/df = 2Ax

Hint: calculate f(x), 2Ax, da/df and dβ/df

K-means clustering (10 points)

3. Briefly describe the two key steps in one iteration of the k-means algorithm. (1 point)

4. What is the distance measure used ink-means (implemented in sk-learn)? (1 point)

5. The k-means algorithm can always converge in a finite number of iterations. Why? (1 point)

6. The clustering result of k-means could be random. Why? (1 point)

7. The minimum value of the loss function is zero for any dataset. What is the clustering result when the loss function is zero? - assuming that the dataset has millions of different samples. (1 point)

Note: for questions 3,4,5,6,7, you only need to write a few words (bullet points) for each one.

8. find the optimal centers by following the steps below when cluster labels are given. (5 points)

The loss function is as defined in the lecture notes.

First, we calculate , where k could be 1, 2, 3, … , K.

Then, we set and we obtain the optimal center

What are (A), (B), (C), (D), and (E) in the above equations ?

Note: (E) is a variable, not a word or sentence

Part 2: Programming

Complete the tasks in the files:

H1P2T1_kmeans.ipynb

If you want to get some bonus points, try this task:

H1P2T2_kmeans_compression.ipynb

Grading: the number of points

	Undergraduate Student	Graduate Student
Basic Concepts	10	10
K-means clustering	10	10
H1P2T1	30	30
H1P2T2	5 (bonus)	5 (bonus)
Total number of points	50 + 5	50 + 5

The following rules are used for every homework assignment.

Each homework assignment is an individual assignment, NOT a group assignment.

For part 1: You may use MS-word to write the answers, convert the file to PDF, and upload it to Blackboard. You may write the answers on a piece of paper, take a photo using your cellphone, and upload the picture to Blackboard. Make sure that your handwriting is human/TA-readable, otherwise you may lose points.

For part 2: complete the ipynb files. Do NOT convert ipynb files topdfor py or another format.

Upload your files to Blackboard and do not miss any files.

Before you submit homework files, make sure you run each and every code cell of your program files. If a code cell is supposed to generate some output (e.g., figure or text) and nothing shows up below the cell because you forget to run the cell, you will lose the points of that cell.

Do NOT use ChatGPT and Copilot.

联系我们