Coursework I:
Learning Learning, From-Scratch
COMP0169 Team
The total points for this exercise is 100.
We will be using two datasets that we will provide to you: Iris and MNIST. You will be provided with two
subsets for each of the dataset, a training subset (XX train samples.npy and XX train labels.npy with
XX the name of the dataset) and a validation subset (XX val samples.npy and XX val labels.npy). We
withhold the test sub-sets of these two data sets and a hidden training set Hidden.
We will test correctness of your code on Hidden. An anonymised leader board with test subsets and on
Hidden will be published at Moodle.
Iris dataset contains the following features in order: sepal length, sepal width, petal length, petal width.
Classes names are: Iris Setosa for label 0, Iris Versicolour for label 1, and Iris Virginica for label 2.
Each exercise must be implemented from scratch. Libraries are allowed unless differently specified. We
encourage to test the correctness of results using libraries.
1 Line Fitting (10 points)
a Implement the normal equation solver function nsolve, which takes as input the matrix X and the
target vector y and returns the optimized weights w. (5 points)
1
b Implement lineFit(X,y) which should fit a linear function to the input data. Test your implementation
on the following task: predict with linear fitting the petal length (cm) of the Iris dataset using the three
remaining variables as inputs (sepal length (cm), sepal width (cm) and petal width (cm)). Report the
L2 loss on the validation set and plot a graph showing the correlation between y and your prediction
on the validation set (2 points)
c Implement polyFit(X,y) which should fit a 2nd degree polynomial to the input data. Test your
implementation on the following task: predict with the polynomial the petal width (cm) of the Iris
dataset using the three remaining variables as inputs (sepal length (cm), sepal width (cm), petal length
(cm), petal width (cm)). The 2nd degree polynomial should consider all possible pairwise terms, i.e.
w1x
2 + w2xy + w3y
2 + w4x + w5y + w6 in the case of two input variables x and y. Report the L2 loss
on the validation set and plot a graph showing the correlation between y and your prediction on the
validation set (3 points)
2 Clustering (14 points)
a) Implement a function pca(X, ndims) that performs PCA over the input data X and returns both the
mean vector ¯X and the ndims top components. The top components are the eigen vectors linked to
the top eigen values computed from the covariance matrix. Try your function on the MNIST dataset,
which is composed of 10 digit classes. Display the top 10 components fitted on the train dataset as
images and check that you can reconstruct perfectly an input digit from the validation set using all
components (7 points)
b) Perform independent research on the clustering algorithm k-means. Implement a function kmeans
performing k-means on input data X. Propose the interface to that function (i.e., what is its input and
output?) and write in three sentences why this is. Apply you Kmeans implementation on the MNIST
training set with k = 10 clusters and display the centroids as images (5 points).
c) Describe the k-means algorithm, highlighting similarities and differences from KNN. Compare the
reconstruction loss on the validation set for both k-means and PCA. Write no more than a third of a
page. (2 points)
3 Linear Classification (26 points)
a) Implement the normal equation-based binary linear classifier lclass(examplesA, examplesB, testExample)
where the first two arguments are the set of samples from class A and class B respectively and the third
is the test. The function should return 0 if test is in A and 1 otherwise. It should, for simplicity, both
train and test in one function call. (5 points)
b) Test this on all the samples in Iris, Setosa vs non-Setosa, etc and propose a simple analysis (text,
figure, table) of the result you find, but not longer than the third of a page. (6 points)
c) Perform independent research how to do multi-class classification. Implement lmclass(examples,
class, testExample) that performs multi-class classification of the examples examples according
to the vector of labels class of the same size and tests it with testExample by returning the vector
probability of being class i. (10 points)
d) Present findings applying multi-class classification on Iris dataset with 3 classes. You can include
figures and tables if needed. Write no longer than third of a page. (5 points)
2
4 Non-linear Classification (25 points)
a) Implement classification based on logistic regression using GD by implementing the gradient function
deLogistic(preds, X, Y) and optimizing using GD. preds are the prediction from the model, X are
the data and Y are the labels. (5 points)
b) Implement classification based on hinge loss (5 points) using GD by implementing the gradient function
deHinge(preds, W, x, y) and optimizing using GD. preds are the prediction from the model, W
describes the model parameters, x is the data and y represent the labels. (5 points)
c) Implement kernel SVM function ksvm(kernel, x, y, xtest). The function takes as input a kernel,
training data and a set of test points. The function returns the set of support vectors along with the
predicted labels. You are allowed to use scipy optimization library to solve the quadratic problem of
SVM. (10 points)
5 Neural network (25 points)
a) Devise a three-layer neural network with n hidden states and sigmoid activations for classification.
Explain how many parameters this has in one sentence. (2 points)
b) Provide the equation for the gradient using chain rule for the network in point a). (8 points)
c) Implement the binary classifier nnclass(examplesA, examplesB, testExample) that is trained with
your implementation of (stochastic) GD and your gradient function using the network. (10 points)
d) Do an analysis how changes of n affect the accuracy, not longer than a third of a page. A table and /
or plot is welcome. Pay attention to table formatting and labels. (5 points)