CSI 5340 Homework Exercise 3
Note that to complete the homework, you must submit
your Python codes for the question
a report (in PDF format) including the required experimental results, descriptions, and other related
information.
Failing to submit either the code or the report will be given zero mark.
In this exercise, you need to design a CNN model for the MNIST dataset http://yann.lecun.com/
exdb/mnist/ and train that model using adversarial training (i.e., the “mini-max formulation” introduced
in the class; details see the lecture slides). More specifically, you need to do the following:
Design a CNN model. Describe your model structure in your report.
Write a code to implement the algorithms of both the targeted and non-targeted PGD attack. Describe
the steps of your implementation for these two algorithms in details in your report.
Train the model to get your first classifier using adversarial training with non-targeted 20-step PGD
attack. The perturbation radius ? is set to 0.3 and the step size α is set to 0.02 during the training.
Train the model to get the second classifier using adversarial training with non-targeted 1-step PGD
attack. The perturbation radius is still 0.3 and the step size α is set to 0.5 during the training.
Train the model to get the third classifier using adversarial training with targeted 20-step PGD attack.
The perturbation radius ? is set to 0.3 and the step size α is set to 0.02 during the training. When
using the targeted PGD attack during the training, the target class label is chosen as the most likely
label predicted by the current classifier excluding the true label. That is the label corresponding to
the largest element in the final output vector of your neural network excluding the value for the true
label.
In your report, you need to document the training behaviours for all the three training settings.
That is, you need to observe how the training loss and training accuracy (on both the perturbed and
unperturbed training data) evolves with each iterations of adversarial training. Plot the training loss
and accuracy curves and exhibit them in a reader-friendly fashion.
Evaluate your classifiers on the testing set using both the targeted and non-targeted 40-step PGD
attack with the following settings of the perturbation radius ? ∈ {0, 0.1, 0.2, 0.3, 0.45} (note that ? = 0
is equivalent to testing your model on the unperturbed data). The step size α is set to 0.01. When
performing targeted attack, the target label is still chosen as the most likely class label (excluding the
true label) predicted by the classifier. Include the testing accuracy in your report.
For each evaluation settings mentioned above, generate some adversarial examples (2 or 3 images for
each) and exhibit them in your report.
In all settings, the perturbation set is the l∞ norm ball introduced in the class.
Exhibit and organize your experimental results in a reader-friendly fashion. Comment on any observations
you have in your experiments and remark on the lessons learned.