#
辅导 COMP7250、Python语言编程讲解

Programming Assignment of COMP7250

Welcome to the assignment of COMP7250!

This assignment consists of two parts:

Part 1: Simple Classification with Neural Network

Part 2: Adversarial Examples for Neural Networks.

Through trying this assignment, you are expected to have a brief understanding on:

How to process data;

How to build a simple network;

The mechanisms of forward and backward propagation;

How to generate adversarial examples with FGSM and PGD methods.

Note that: although today we have powerful PyTorch to build up the deep learning piplines, in this

assignment, you are expected to use python and some fundamental packages like numpy to

realize the functions below and understand how it works.

TASK 1. Please realize an util function: one_hot_labels.

In the implementation of the cross entropy loss, it is much convenient if numerical labels are

transformed into one-hot labels.

For example, numerical label 5 -> one-hot label [0,0,0,0,0,1,0,0,0,0].

To begin with, MNIST dataset is loaded from the following files:

Note that, the original training images and labels are separated into a set of 50k data for training

and 10k data for validation.

import random

import numpy as np

import matplotlib.pyplot as plt

from tqdm import tqdm

def one_hot_labels(labels):

one_hot_labels = np.zeros((labels.size, 10))

### YOUR CODE HERE

### END YOUR CODE

return one_hot_labels

images_train.npy: 60k images with normalized features, the dimension of image

features is 784.

images_test.npy

labels_train.npy: corresponding numerical labels (0-9).

labels_test.npy

TASK 2. Shuffle the data

As it is known to us, in order to avoid some potential factors that influence the learning process of

the model, shuffled data are preferred. Thus, please complete data by adding shuffle operations.

TASK 3. Please realize the softmax function as well as the sigmoid function: softmax(x) and

sigmoid(x).

The -th element of softmax is calculated via:

The last equation holds since adding a constant won't change softmax results. Note that, you may

encounter an overflow when softmax computes the exponential, so please using the 'max' tricks

to avoid this problem.

The sigmoid is calculated by:

For numerical stability, please use the 1st equation for positive inputs, and the 2nd equation for

negative inputs.

# Function for reading data&labels from files

def readData(images_file, labels_file):

x = np.load(images_file)

y = np.load(labels_file)

return x, y

def prepare_data():

trainData, trainLabels = readData('images_train.npy', 'labels_train.npy')

trainLabels = one_hot_labels(trainLabels)

### YOUR CODE HERE

### END YOUR CODE

valData = trainData[0:10000,:]

valLabels = trainLabels[0:10000,:]

trainData = trainData[10000:,:]

trainLabels = trainLabels[10000:,:]

testData, testLabels = readData('images_test.npy', 'labels_test.npy')

testLabels = one_hot_labels(testLabels)

return trainData, trainLabels, valData, valLabels, testData, testLabels

# load data for train, validation, and test.

trainData, trainLabels, devData, devLabels, testData, testLabels = prepare_data()

def softmax(x):

"""

x is of shape: batch_size * #class

"""

TASK 4. Please try to build a learning model according to the following instructions.

1. Complete the forward propagation function.

It is time to realize the forward propagation for a 2-layer neural network. Here, sigmoid is used as

the activation function, and the softmax is used as the link function. Formally,

hidden layer:

output layer:

loss:

Therein, are the outputs (before activation function) from the hidden layer and output layer;

, and are the learnable parameters. Concretely, and denote the weight

matrix and the bias vector for the hidden layer of the network. Similarly, and are the

weights and biases for the output layer. Note that, in computing the loss value, we should use

np.log(y+1e-16) instead of np.log(y) to avoid NaN (Not a Number) error in the case log(0).

2. Complete the calculation of gradients.

Based on the forward_pass function, you can implement the corresponding function for backward

propagation.

In order to help you have a better understanding of backward propagation process, please

calculate the gradients by hand first and complete the codes. (The gradients include the gradients

of , , , , , , .) Please present the calculation process in the cell below with

markdown.

Gradient Calculation Cell:

Please present the calculation process in this cell.

### YOUR CODE HERE

### END YOUR CODE

return s

def sigmoid(x):

"""

x is of shape: batch_size * dim_hidden

"""

### YOUR CODE HERE

### END YOUR CODE

return s

class Model(object):

def __init__(self, input_dim, num_hidden, output_dim, batchsize,

learning_rate, reg_coef):

'''

Args:

input_dim: int. The dimension of input features;

num_hidden: int. The dimension of hidden units;

output_dim: int. The dimension of output features;

TASK 5. Please complete the training procedure: nn_train(trainData, trainLabels, devData,

devLabels, **argv).

batchsize: int. The batchsize of the data;

learning_rate: float. The learning rate of the model;

reg_coef: float. The coefficient of the regularization.

'''

self.W1 = np.random.standard_normal((input_dim, num_hidden))

self.b1 = np.zeros((1, num_hidden), dtype=float)

self.W2 = np.random.standard_normal((num_hidden, output_dim))

self.b2 = np.zeros((1, output_dim))

self.reg_coef = reg_coef

self.bsz = batchsize

self.lr = learning_rate

def forward_pass(self, data, labels):

"""

return hidder layer, output(softmax) layer and loss

data is of shape: batch_size * dim_x

labels is of shape: batch_size * #class

"""

# YOUR CODE HERE

# END YOUR CODE

return h, y_hat, loss

def optimize_params(self, h, y_hat, data, labels):

# YOUR CODE HERE

# END YOUR CODE

if self.reg_coef > 0:

grad_W2 += self.reg_coef*self.W2

grad_W1 += self.reg_coef*self.W1

grad_W2 /= self.bsz

grad_W1 /= self.bsz

self.W1 -= self.lr * grad_W1

self.b1 -= self.lr * grad_b1

self.W2 -= self.lr * grad_W2

self.b2 -= self.lr * grad_b2

def calc_accuracy(y, labels):

"""

return accuracy of y given (true) labels.

"""

pred = np.zeros_like(y)

pred[np.arange(y.shape[0]), np.argmax(y, axis=1)] = 1

res = np.abs((pred - labels)).sum(axis=1)

acc = res[res == 0].shape[0] / res.shape[0]

return acc

As a convention, parameters should be randomly initialized with standard gaussian

variables, and parameters should initialized by 0. You can run nn_train with different values

of the hyper-parameter reg_strength to validate the impact of the regularization to the network

performance (e.g. reg_strength=0.5).

After training, we can plot the training and validation loss/accuracy curves to assess the model and

the learning procedure.

def nn_train(trainData, trainLabels, devData, devLabels,

num_hidden=300, learning_rate=5, batch_size=1000, num_epochs=30,

reg_strength=0):

(m, n) = trainData.shape

params = {}

N = m

D = n

K = trainLabels.shape[1]

H = num_hidden

B = batch_size

model = Model(input_dim=n, num_hidden=H, output_dim=K,

batchsize=B, learning_rate=learning_rate,

reg_coef=reg_strength)

num_iter = int(N / B)

tr_loss, tr_metric, dev_loss, dev_metric = [], [], [], []

for i in tqdm(range(num_epochs)):

for j in range(num_iter):

batch_data = trainData[j * B: (j + 1) * B]

batch_labels = trainLabels[j * B: (j + 1) * B]

### YOUR CODE HERE

### END YOUR CODE

_, _y, _cost = model.forward_pass(trainData, trainLabels)

tr_loss.append(_cost)

tr_metric.append(calc_accuracy(_y, trainLabels))

_, _y, _cost = model.forward_pass(devData, devLabels)

dev_loss.append(_cost)

dev_metric.append(calc_accuracy(_y, devLabels))

return model, tr_loss, tr_metric, dev_loss, dev_metric

num_epochs = 30

model, tr_loss, tr_metric, dev_loss, dev_metric = nn_train(

trainData, trainLabels, devData, devLabels, num_epochs=num_epochs)

xs = np.arange(num_epochs)

fig, axes = plt.subplots(1, 2, sharex=True, sharey=False, figsize=(12, 4))

ax0, ax1 = axes.ravel()

ax0.plot(xs, tr_loss, label='train loss')

ax0.plot(xs, dev_loss, label='dev loss')

Finally, we should evaluate the model performance on test data.

Part 2: Adversarial Examples for Neural Networks

It has been seen that many classifiers, including neural networks, are highly susceptible to what

are called adversarial examples -- small perturbations of the input that cause a classifier to

misclassify, but are imperceptible to humans. For example, making a small change to an image of

a stop sign might cause an object detector in an autonomous vehicle to classify it as an yield sign,

which could lead to an accident.

In this part, we will see how to construct adversarial examples for neural networks, and you

are given a 3-hidden layer perceptron trained on the MNIST dataset for this purpose.

Since we are interested in constructing the countersample rather than the original classification

task, we do not need to worry too much about the design of the neural network and the

processing of the data (which are already given). The parameters of the perceptron can be loaded

from fc*.weight,npy and fc*.bias.npy. The test dataset can be loaded from X_test.npy and

Y_test.npy. Each image of MNIST is 28×28 pixels in size, and is generally represented as a flat

vector of 784 numbers. It also includes labels for each example, a number indicating the actual

digit (0 - 9) handwritten in that image.

Enjoy practicing generating adversarial examples and have fun!

First, we need to define some functions for later computing.

TASK 1. Please realize the following functions:

relu: The relu function is calculated as:

ax0.legend()

ax0.set_xlabel('# epoch')

ax0.set_ylabel('CE loss')

ax1.plot(xs, tr_metric, label='train acc')

ax1.plot(xs, dev_metric, label='dev acc')

ax1.legend()

ax1.set_xlabel('# epoch')

ax1.set_ylabel('Accuracy')

plt.show()

def nn_test(data, labels):

h, output, cost = model.forward_pass(data, labels)

accuracy = accuracy = (np.argmax(output,axis=1) ==

np.argmax(labels,axis=1)).sum() * 1. / labels.shape[0]

return accuracy

accuracy = nn_test(testData, testLabels)

print('Test accuracy: {0}'.format(accuracy))

from __future__ import print_function

import numpy as np

import matplotlib.pyplot as plt

relu_grad: The relu_grad is used to compute the gradient of relu function as:

Next, we define the structure and some utility functions of our multi-layer perceptron.

The neural net is a fully-connected multi-layer perceptron with three hidden layers. The hidden

layers contains 2048, 512 and 512 hidden nodes respectively. We use ReLU as the activation

function at each hidden node. The last intermediate layer’s output is passed through a softmax

function, and the loss is measured as the cross-entropy between the resulted probability vector

and the true label.

def relu(x):

'''

Input

x: a vector in ndarray format

Output

relu_x: a vector in ndarray format,

representing the ReLu activation of x.

'''

### YOUR CODE HERE

### END YOUR CODE

return relu_x

def relu_grad(x):

'''

Input

x: a vector in ndarray format

Output

relu_grad_x: a vector in ndarray format,

representing the gradient of ReLu activation.

'''

### YOUR CODE HERE

### END YOUR CODE

return relu_grad_x

def cross_entropy(y, y_hat):

'''

Input

y: an int representing the class label

y_hat: a vector in ndarray format showing the predicted

probability of each class.

Output

the cross entropy loss.

'''

etp = one_hot_labels(y)*np.log(y_hat+1e-16)

loss = -np.sum(etp)

return loss

TASK 2. Please realize the forward propogation and the gradient calculation:

The forward propagation rules are as follows.

The gradient calculation rules are as follows.

Let L denote the cross entropy loss of an image-label pair (x, y). We are interested in the gradient

of L w.r.t. x, and move x in the direction of (the sign of) the gradient to increase L. If L becomes

large, the new image x_adv will likely be misclassified.

We use chain rule for gradient computation. Again, let h0 be the alias of x. We have:

The intermediate terms can be computed as follows.

TASK 3. Please generate the & adversarial examples based on the gradient.

We begin with deriving a simple way of constructing an adversarial example around an input (x, y).

Supppose we denote our neural network by a function f: X {0,...,9}.

Suppose we want to find a small perturbation of x such that the neural network f assigns a label

different from y to x+ . To find such a , we want to increase the cross-entropy loss of the

network f at (x, y); in other words, we want to take a small step along which the cross-entropy

loss increases, thus causing a misclassification. We can write this as a gradient ascent update, and

to ensure that we only take a small step, we can just use the sign of each coordinate of the

gradient. The final algorithm is this:

x: the input image vector with dimension 1x784.

y: the true class label of x.

zi: the value of the i-th intermediate layer before activation, with dimension

1x2048, 1x512, 1x512 and 1x10 for i = 1, 2, 3, 4.

hi: the value of the i-th intermediate layer after activation, with dimension

1x2048, 1x512 and 1x512 for i = 1, 2, 3.

p: the predicted class probability vector after the softmax function, with

dimension 1x10.

Wi: the weights between the (i - 1)-th and the i-th intermediate layer. For

simplicity, we use h0 as an alias to x. Each Wi has dimension li_1 x li, where li

is the number of nodes in the i-th layer. For example, W1 ha dimension 784x2048.

bi: the bias between the (i - 1)-th and the i-th intermediate layer. The

dimension is 1 x li.

where L is the cross-entropy loss, and it is known as the Fast Gradient Sign Method (FGSM).

From the angle of image processing, digital images often use only 8 bits per pixel, so they discard

all information below of the dynamic range. Because the precision of the feeatures is limited,

it is not rational for the classifier to respond differently to an input than to an adversarial input if

every element of the perturbation is smaller than the precision of the features. Thus, it is expected

that both original input and adversarial input are assigned to the same class if the perturbation

satistifies . In practical, the constraint is applied

as:

Please apply this constraint on the algorithm.

As shown above, FGSM is a single-step scheme. Next, we would like to introduce a multi-step

scheme, assignmented Gradient Descent(PGD).

Different from FGSM, PGD takes several steps for more powerful adversarial examples:

Specifically, first initializing with x.

Then, for each step, the following operations are conducted:

Compute the loss ;

Compute the gradients of input : ;

Normalize the gradients above with -Norm;

Generate intermediate adversaray:

Normalize the with the perturbation 's -Norm:

Get the adversarial example:

Apply the constraint on the adversary.

class MultiLayerPerceptron():

'''

This class defines the multi-layer perceptron we will be using

as the attack target.

'''

def __init__(self):

pass

def load_params(self, params):

'''

This method loads the weights and biases of a trained model.

'''

self.W1 = params["fc1.weight"]

self.b1 = params["fc1.bias"]

self.W2 = params["fc2.weight"]

self.b2 = params["fc2.bias"]

self.W3 = params["fc3.weight"]

self.b3 = params["fc3.bias"]

self.W4 = params["fc4.weight"]

self.b4 = params["fc4.bias"]

def set_attack_budget(self, eps):

'''

This method sets the maximum L_infty norm of the adversarial

perturbation.

'''

self.eps = eps

def forward(self, x):

'''

This method finds the predicted probability vector of an input

image x.

Input

x: a single image vector in ndarray format

Ouput

self.p: a vector in ndarray format representing the predicted class

probability of x.

Intermediate results are stored as class attributes.

You might need them for gradient computation.

'''

W1, W2, W3, W4 = self.W1, self.W2, self.W3, self.W4

b1, b2, b3, b4 = self.b1, self.b2, self.b3, self.b4

self.z1 = np.matmul(x,W1)+b1

#######################################

### YOUR CODE HERE

### END YOUR CODE

#######################################

return self.p

def predict(self, x):

'''

This method takes a single image vector x and returns the

predicted class label of it.

'''

res = self.forward(x)

return np.argmax(res)

def gradient(self,x,y):

'''

This method finds the gradient of the cross-entropy loss

of an image-label pair (x,y) w.r.t. to the image x.

Input

x: the input image vector in ndarray format

y: the true label of x

Output

grad: a vector in ndarray format representing

the gradient of the cross-entropy loss of (x,y)

w.r.t. the image x.

'''

#######################################

### YOUR CODE HERE

### END YOUR CODE

#######################################

return grad

def attack(self, x, y, attack_mode, epsilon, alpha=2./255.):

'''

This method generates the adversarial example of an

image-label pair (x,y).

Input

x: an image vector in ndarray format, representing

the image to be corrupted.

y: the true label of the image x.

attack_mode: str. Choice of the mode of attack. The mode can be

selected from ['no_constraint', 'L-inf', 'L2'];

epsilon: float. Hyperparameter for generating perturbations;

pgd_steps: int. Number of steps for running PGD algorithm;

alpha: float. Hyperparameter for generating adversarial examples in

each step of PGD;

Output

x_adv: a vector in ndarray format, representing

the adversarial example created from image x.

'''

#######################################

if attack_mode == 'L-inf':

### YOUR CODE HERE

### END YOUR CODE

elif attack_mode == 'L2':

### YOUR CODE HERE

### END YOUR CODE

elif attack_mode == 'no_constraint':

### YOUR CODE HERE

### END YOUR CODE

else:

raise ValueError("Unrecognized attack mode, please choose from ['Linf', 'L2'].")

#######################################

return x_adv

Now, let's load the test data and the pre-trained model.

Check if the image data are loaded correctly. Let's visualize the first image in the data set.

Check if the model is loaded correctly. The test accuracy should be 97.6%

TASK 4. Please generate adversarial examples and check the image.

Generate an FGSM adversarial example with (without constraint).

Generate an PGD adversarial example with (with constraint).

TASK 5. Try the adversarial attack and test the accuracy of using PGD adversarial examples.

X_test = np.load("./X_test.npy")

Y_test = np.load("./Y_test.npy")

params = {}

param_names = ["fc1.weight", "fc1.bias",

"fc2.weight", "fc2.bias",

"fc3.weight", "fc3.bias",

"fc4.weight", "fc4.bias"]

for name in param_names:

params[name] = np.load("./"+name+'.npy')

clf = MultiLayerPerceptron()

clf.load_params(params)

x, y = X_test[0], Y_test[0]

print ("This is an image of Number", y)

pixels = x.reshape((28,28))

plt.imshow(pixels,cmap="gray")

nTest = 1000

Y_pred = np.zeros(nTest)

for i in tqdm(range(nTest)):

x, y = X_test[i][np.newaxis, :], Y_test[i]

Y_pred[i] = clf.predict(x)

acc = np.sum(Y_pred == Y_test[:nTest])*1.0/nTest

print ("Test accuracy is", acc)

### Output pixels: an FGSM adversarial example with eps=0.1 (without constraint).

### YOUR CODE HERE

### END YOUR CODE

plt.imshow(pixels,cmap="gray")

### Output pixels: a PGD adversarial example with eps=8/255 (L_2 constraint).

### YOUR CODE HERE

### END YOUR CODE

plt.imshow(pixels,cmap="gray")

You can get a test accuracy of using adversarial examples after running this cell. (Please use 1000

test samples)

### Output acc: the adversarial accuracy.

### YOUR CODE HERE

### END YOUR CODE

print ("Test accuracy of adversarial examples is", acc)