CSCI433/933: Machine Learning Algorithms and

Applications

Assignment Problem Set III (Individual)

Part A: General knowledge questions - 15 Marks

1. In what way are regression and neural networks models similar?

2. Why is a deep neural network better than a shallow network?

3. What is the difference between a convolutional neural network (CNN) and a recurrent neural

network (RNN)?

4. Explain the principles of back propagation algorithm.

5. Explain the concepts of over-fitting and under-fitting

6. Explain the effects of ridge and LASSO regularizations as used in regression model.

Part B: Design and Programming - 50 + 10 bonus Marks

Aims and Objectives

This assignment aims at evaluating basic familiarity with fundamental concepts and implementa-

tion of deep neural networks and statistical machine learning. On completion of this assignment,

you should be able to demonstrate basic mastery of:

• concepts of deep autoencoder, feature extraction, SVM, convolutional autoencoder;

• implementation of machine learning algorithms using Tensorflow, Keras, SkLearn.

Introduction

Autoencoder is a popular unsupervised learning technique used to learn data representations.

Specifically, a neural network architecture is designed such that we impose a bottleneck in the

network which forces a compressed knowledge representation of the original input. To build an

autoencoder, we need three components: an encoder to compress the data, a decoder to decompress

the data, and a loss function to measure the data reconstruction error. There are different types

of autoencoder models with the variation of the encoder/decoder architectures. This assignment

will focus mainly on two popular autoencoder models, which are deep fully-connected autoencoder

and deep convolutional autoencoder.

1

What needs to be done

1. Implement a multi-layer fully-connected autoencoder using Tensorflow and Keras (15 + 5 bonus

Marks):

• Load fashion mnist using keras. Fashion mnist is a dataset of article images-consisting of a

training set of 60,000 examples and a test of 10,000 examples. Each image is a 28×28 grayscale

image, associated with a label from 10 classes. Change the data type of each image as float32

and normalize the pixel values to [0, 1]. Hint: x train, x test = x train.astype(’float32’)/255.0,

x test.astype(’float32’)/255.0.

• Use Tensorflow and keras to implement a six-layer fully-connected autoencoder based on

fashion mnist dataset. The encoder and decoder of the autoencoder both consists of three

layers. Each input image is flattened into dimensionality of 784. The three encoder layers

are with output dimensionality of 128, 64, 32. The three decoder layers are with output

dimensionality of 64, 128, 784. After each of the six layers, nonlinear function ReLU is

applied.

• Train the network using mean squared error as loss function and Adam as optimizer.

The batch size should be set to 256. Train the network for 30 epochs. Hint: for each epoch,

the training data should be randomly shuffled.

• Print out the training error and testing error for each epoch. Randomly choose two test

images, display and compare the original images and reconstructed images.

• Bonus (5 marks): Try to increase the depth (more layers) and width (higher dimensional

hidden layers) of the autoencoder and monitor the change of training and testing losses.

2. Train SVM classifier based on the image representations extracted from the above autoencoder

(15 marks):

• Once the above fully-connected autoencoder is trained, for each image, extract the 32-

dimensional hidden vector (the output of the ReLU after the third encoder layer) as the

image representation.

• Train a linear SVM classifier on the training images of fashion mnist based on the 32-

dimensional features. Tune the hyper-parameter ’C’ using cross-validation. Print out the

training accuracy. SkLearn is recommended.

• Test the trained SVM on the test images of fashion mnist. Print out the testing accuracy.

• Try a kernel-based SVM and compare the performance with linear SVM.

3. Train deep convolutional autoencoder using Tensorflow and Keras (10 + 5 bonus Marks):

• Load and pre-process fashion mnist as in fully-connected autoencoder.

• Train a six-layer convolutional autoencoder. Different from fully-connected autoencoder,

each encoder and decoder layer is now a convolutional layer instead of fully-connected layer.

Again, the autoencoder consists of three encoder layers and three decoder layers. Each

input image to the autoencoder is now 3D (28 × 28 × 1). For each convolutional layer, use

convolutional kernel of 3 × 3, stride = 1, padding=’same’. After the first and second

encoder convolutional layers, use (2, 2) max pooling to downsample the feature maps and

after the first and second decoder convolutional layers, use (2, 2) upsampling2d operation

to upsample the feature maps. Choose proper number of neurons for each convolutional

layer, e.g. 16, 24. Each convolutional layer uses activation function ReLu.

• Train the network using mean squared error as loss function and Adam as optimizer.

The batch size should be set to 256. Train the network for 15 epochs. Hint: for each epoch,

the training data should be randomly shuffled.

2

• Print out the training error and testing error for each epoch. Randomly choose two test

images, display and compare the original images and reconstructed images.

• Bonus (5 marks): Implement denoisy convolutional autoencoder using the same architecture

as above.

4. Write a report of no more than three (3) pages to illustrate the experiments as well as your

conclusions (10 Marks). By default, after reading your report, others should be able to reproduce

your experiments.

Part C: Numerical/Analytical - 35 Marks

A company manufactures personal protective equipment (PPE) for use in hospitals and personal

use. The company has two manufacturing facilities (factories) from which these PPE are made

before they are transported to a warehouse and packed for export. After a few consignments

were delivered, customers complained about defective equipment. The PPE are made from several

components that could contribute to malfunctioning. It is desired to design a classifier that can

identify which of the two factories produced a given defective PPE. Let the components of the PPE

be available and measurable/testable. They can be be modelled as d conditionally independent

binary-valued features, x = (x1, . . . , xd)

t, where the components xi are either 0 or 1. The two

factories are modelled as classes ω1 and ω2. Suppose the company has information about the

reliability of the factories regarding each component. This information can be modelled as

pi = Pr[xi = 1|ω1]

qi = Pr[xi = 1|ω2],

where pi and qi are respectively the probabilities of each factory making a non-defective component.

Each feature thus gives us a yes/no answer about the pattern. Furthermore, if pi > qi, we expect

the i th feature to give a “yes” answer more frequently when the state of nature is ω1 than when

it is ω2.

Now, the assumption of conditional independence allows the class conditional probabilities to be

written as products of the components of x. For example we can write:

P (x|ω1) =

d∏

i=1

pxii (1− pi)1−xi (1)

for the class ω1. The likelihood ratio is written as

P (x|ω1)

P (x|ω2) . The discriminant function for each class

can be written equivalently (i.e. they will produce the same result) as any one of the following

three equations:

gi(x) = P (ωi|x) = p(x|ωi)P (ωi)∑c

j=1 p(x|ωj)P (ωj)

(2)

gi(x) = p(x|ωi)P (ωi)

gi(x) = ln p(x|ωi) + lnP (ωi) (since natural logarithm is a monotonic function.)

where (ln) is the natural logarithm and c is the number of classes. For this binary classification

problem we can combine the two discriminant functions and write

g(x) ≡ g1(x)− g2(x) (3)

and then decide class ω1 if g(x) > 0; class ω2 if g(x) ≤ 0.

1. Using Equations (1), (2 ) and (3) derive a discriminant function of the form

P (ω2)

and wi is a weight; w0 is the bias.

2. What is the relevance of the weight in determining a classification?

3. Assuming there are only three components in making a PPE, and P (ω1) = P (ω2) = 0.5,

pi = 0.8, qi = 0.5 for i = 1, 2, 3, compute the values of the weights wi and threshold w0.

4. Using the values of wi and w0 obtained in the last question, determine the factory (ω1 or ω2)

from which a PPE with feature (1, 0, 1) was manufactured.

Part D: Submission instructions

Submission:

1. Prepare your response to Part A in a PDF file of no more than 2 pages.

2. Prepare three executable python files, i.e. deep autoencoder.py, svm.py, conv autoencoder.py.

All the implementations should be based on python3.

3. Prepare a PDF file containing your 3-page report.

4. Prepare your response to Part C in a PDF file of no more than 3 pages.

5. These files should be compresssed as a ZIP (or RAR) file and submitted on Moodle on or

before the due date and time.