Homework 3
May 2022
1 Multi-class classification
Solve a simple classification problem by four different methods. The
Glass Identification dataset (glass.txt) has 11 columns where the first column
is the ID of the sample (needs to be ignored), the 2nd to 10th columns
are features. The last column is the type of glass. The Glass Identification
dataset includes six types of glasses.
In general, there are two types of classification algorithms: binary classification
algorithms and multi-class classification algorithms. Binary classification
is the case where we have to classify objects into two groups.
Generally, these two groups consist of “positive” and “negative” samples,
respectively. On the other hand, in multi-class classification, we have to
classify objects into more than two classes. In this case, given a set of attributes
of glasses, a multi-class classification task would be to determine
the type of glasses.
Specifically, the dataset has 214 samples. You can use 90% samples as
the training set, and 10% samples as the test set.
Here are four methods you need to implement:
(1) Using logistic regression models to solve the multi-class classification
problem. A multi-class classification problem can be split into many
different binary classification problems. To understand how this works, let
us consider an example. A classification problem is to classify various glasses
into six types: float processed building windows, not float processed build-
1
ing windows, vehicle windows, containers, tableware, and headlamps. Now,
this is a multi-class classification problem. You can use logistic regression
models to solve the multi-class classification problem as follows.
The above multi-class classification problem can be decomposed into
six binary classification problems:
Problem 1 : Float processed building windows vs [Not float processed building
windows, Vehicle windows, Containers, Tableware, Headlamps]
Problem 2 : Not float processed building windows vs [Float processed building
windows, Vehicle windows, Containers, Tableware, Headlamps]
...
Problem 6 : Headlamps vs [Float processed building windows, Not float
processed building windows, Vehicle windows, Containers, Tableware]
Making decisions means applying all classifiers to an unseen sample
Xi and predicting the label for which the corresponding classifier reports
the highest confidence score. The logistic regression model (yi ∈ {0, 1}) for
binary classification is on page 4 of lecture 1.
(2) Find a linear model with a weight matrix W to solve the multi-class
classification problem. For an input sample Xi ∈ R9
, we have yi = WT Xi ∈
R6
. Then the predicted class cj = arg max
j∈1,...,6
pj , where p = sof tmax(yi) ∈
R6
. To this end, you need to compute the cross entropy loss and use gradient
descent to find a weight matrix W.
(3) Using Linear Discriminant Analysis (LDA) models to solve the
multi-class classification problem. Similarly, the multi-class classification
problem can be decomposed into many different binary classification problems.
In this way, you can use an LDA model to solve a binary classification,
which is on page 26 of lecture 1.
(4) Using logistic regression models trained by logistic loss to solve the
multi-class classification problem. Similarly, the multi-class classification
problem can be decomposed into many different binary classification problems.
The logistic loss (yi ∈ {+1, −1}) for binary classification is on page
12 of lecture 1.
2
Please provide the following results for four methods:
(1) Use Principal Component Analysis (PCA) to project the weight
matrix W ∈ R9×6 onto lower-dimensional space while preserving as much of
the variance in the weight matrix as possible. In other words, W′ = PW,
where P ∈ R2×9 or P ∈ R3×9
, and W′ ∈ R2×6 or W′ ∈ R3×6
(Specifically, 2-
dimensional or 3-dimensional can be selected by yourself). In this way, you
can draw six columns of the matrix W′
. Besides, you also need to project
training samples into the lower-dimensional space based on the matrix P,
e.g., Xi
′ = P Xi
, where Xi ∈ R9
, and Xi
′ ∈ R2 or Xi
′ ∈ R3
. Specifically,
projected training samples and six columns of the matrix W′
should be
visualized together. You can use existing python or Matlab PCA functions.
e.g., the visualization code in python (plot_pca.py)
(2) Please report the classification accuracy of four methods on the test
set.
(3) Discuss the difference and similarity of these four methods from
problem formulation and loss function. The discussion is an open problem.
You are not limited to the two above perspectives.