讲解COGS 118A、辅导Python，Java编程

Homework Assignment 7
COGS 118A: Supervised Machine Learning Algorithms
Due: May 20th, 2022, 11:59pm (Pacific Time).
Instructions: Your math can be handwritten. In that case use a scanner app to
take photos of the pages and turn them into a PDF. Adobe, Evernote, Microscoft,
and many others offer free scanner apps. Alternatively, your math may be done in
LaTeX or in a Jupyter Notebook using LaTeX markdown and from there turned into
PDF. All of these different sources must be stitched together into a single PDF file.
PDF Merge tool (e.g. smallpdf, comebinepdf) can do this among many other tools.
Make sure to show the steps of your solution, not just the final answer.
Late Policy: You will receive 75 percent credit MAX for any late assignments. Late
submissions will not be accepted after 1 week.
Grade: out of 63 points
1 (12 points) Conceptual Questions
1. In the Figure 1, we use a black line to mark the decision boundary of a linear
SVM classifier parameterized by weight vector w and bias b. If we start to
increase the margin of the classifier, what would happen to w?
7SCR?
Computing the margin wi th
Figure 1: A linear SVM classifier with data points.
A. The magnitude of w does not change.
B. w will have smaller magnitude.
1
C. w will have greater magnitude.
D. None of above.
2. Which one below best describes what support vectors are:
A. The decision boundary.
B. The positive and the negative planes.
C. The training samples that determine the positive and the negative planes.
D. The test samples that determine the positive and the negative planes.
3. Consider a dataset S = {(xi, yi), i = 1, 2, ..., n} where x = [x1, x2]? and y ∈
{?1,+1}. Here we try to learn a linear SVM classifier parameterized by weight
vector w and bias b to fit the dataset S. Which one is the appropriate loss
function for the linear SVM classifier from the options below:
A. L(w, b) = 2√
ww
C ×∑ni=1max (0, 1? yi × (w?xi + b))
B. L(w, b) = 1
2
ww + C ×∑ni=1max (0, 1? yi × (w?xi + b))
C. L(w, b) = 2√
ww
+ C ×∑ni=1max (0, 1? yi × (w?xi + b))
D. L(w, b) = 1
2
ww C ×∑ni=1max (0, 1? yi × (w?xi + b))
2 (16 points) K Nearest Neighbors
Consider a training dataset Straining = {(xi, yi), i = 1, 2, ..., 8} where each data point
(x, y) has a feature vector x = [x1, x2]
and the corresponding label y ∈ {?1,+1}.
The points with the corresponding labels in the dataset are shown in the figure below.
In the figure above, the index i for each training example xi is given in bold near
the point. You are asked to predict the label of a point xpred = [1, 0]
shown in the
figure as a triangle ▲ using k nearest neighbors (k-NN) method under Euclidean
distance.
1. (4 points) List the x vectors of all the data points in Straining and their corre-
sponding labels.
2. (4 points) Determine the predicted label for xpred using the k-NN with different
k.
(a) k = 1.
(b) k = 5.
3. (8 points) For KNNs and SVMs:
(a) Describe the differences between in the decision boundaries of each algo-
rithm.
(b) When might an SVM be more appropriate than a KNN? Visa Versa?
3 (15 points) Support Vector Machine
Consider a dataset {(xi, yi), i = 1, 2, ..., 10} where x = [x1, x2]? and y ∈ {?1,+1},
which is shown in the Figure 2 below. Suppose we have trained a support vector
machine (SVM) classifier on the dataset, which has the decision boundary :
w?x+ b = 0. The SVM classifier is optimized as following:
Find: argmin
w
1
2
||w||22
Subject to, ?i :w?xi + b ≥ +1, if yi = +1,
wxi + b ≤ ?1, if yi = ?1.
We define ?+ : w
x + b = +1 as the positive plane and ?? : w?x + b = ?1 as the
negative plane. After training the SVM classifier using the gradient descent method
for several iterations to reach an intermediate state, we have obtained the positive
plane and the negative plane, which are indicated as solid lines in the Figure 2
below.
Figure 2: The positive plane, the negative plane and the data points.
1. (2 points) Draw the decision boundary of the SVM classifier in Figure 2.
2. (8 points) Calculate the w and b from the positive plane and negative plane.
Hint: The positive plane passes through x4 = (2, 2), x8 = (3, 4). The negative
plane passes through x3 = (2, 0), x7 = (3, 2).
3. (5 points) Calculate for size of the margin based on your w.
Hint: Margin can be obtained from 2||w||2 .
4 (10 points) SVM: Gradient
Given a training dataset Straining = {(xi, yi)}, i = 1, . . . , n}, we wish to optimize the
loss L(w, b) of a linear SVM classifier:
L(w, b) = 1
2
||w||22 + C
n∑
i=1
(
1? yi(wTxi + b)
)
+
(1)
where (z)+ = max(0, z) is called the rectifier function and C is a scalar constant.
The optimal weight vector w? and the bias b? used to build the SVM classifier are
defined as follows:
w, b = argmin
w,b
L(w, b) (2)
In this problem, we attempt to obtain the optimal parameters w? and b? by using a
standard gradient descent algorithm.
Hint: To find the derivative of L(w, b), please consider two cases:
(a) 1? yi(wTxi + b) ≥ 0, (b) 1? yi(wTxi + b) < 0
1. find the derivative:
5 (10 points) SVM: Margin
As shawn in the Figure 3, we have the decision boundary (marked as a black line)
defined as Eq. 3 given w, b:
wTx+ b = 0 (3)
In parallel to the decision boundary, we have the positive plane (marked as a red line)
defined as Eq. 4 and the negative plane (marked as a blue line) defined as Eq. 5:
wTx+ b = +1 (4)
wTx+ b = ?1 (5)
We pick an arbitrary point x? on the negative plane, and draw a purple line that
passes x? and is perpendicular to the negative plane. The intersection between this
purple line and the positive plane can be denoted as x+. Thus, we have the following
Eq. 6 that indicates the relation between x+ and x?:
x+ = x + λw (6)
where λ ∈ R is an undetermined scalar. The margin M is defined as the distance
between the positive and the negative planes, which can be calculated from Eq. 7:
M = ||x+ ? x?||2 =
√
< λw, λw > (7)
7SCR?
Computing the margin width
Figure 3: The decision boundary, the positive plane and the negative plane.
Please derive the following according to Eq. 7:
M =
2√
< w,w >
Hint: You can firstly represent λ in the form of w by using Eq. 4, 5, 6.