首页 > > 详细

ENGR 3H Midterm

 ENGR 3H Midterm

ENGR 3H Midterm
It’s neural network time! Due Sunday, November 3 11:59 PM
Project Overview
You’re going to be responsible for writing a number of functions that will ac￾complish different components of your neural network. The functions/scripts
you are responsible for are:
(a) nn initialize.m–initializes the neural network’s matrices;
(b) nn forwardprop.m–computes the forward propagation step for the neu￾ral network;
(c) nn backprop.m–computes the gradients from the backward propagation
step for the neural network training;
(d) nn update.m –uses the gradient values from the back prop step to update
the weights of the neural network;
(e) nn evaluate.m –evaluates the accuracy of the neural network’s predic￾tions; and
(f) nn main.m –script that initializes, trains, and validates the neural net￾work, then saves the result.
Each of these functions/scripts will be detailed below.
Code Specifications
nn initialize.m
This function will initialize the weight matrices Wi and biases bi for each layer
of our neural network. The function should take two inputs: a row vector
specifying the number of neurons in each layer and a scalar value specifying
the number of input parameters to the system. The function should return a
cell array containing all of the initialized matrices and vectors. Note that since
Learning outcomes:
Author(s): Tim Matchen
1
ENGR 3H Midterm
our network will only have a single output, the final value in the vector should
always be 1.
For example, the row vector 
2 3 5 
and input size 6 should produce arrays
of the following sizes:
(a) W1: 2 × 6, b1: 2 × 1
(b) W2: 3 × 2, b2: 3 × 1
(c) W3: 5 × 3, b3: 5 × 1.
The bias vectors should be initialized with zeros. To initialize the weight matri￾ces, we’re going to use Xavier initialization. This is a method of ensuring that
we pick weights that aren’t too big or too small to start with. To do this, each
element of the matrix should be selected randomly from a normal distribution
(use normrnd in MATLAB) with mean value µ = 0 and a standard deviation
given by:
σ = r 2 nin + nout
, (1)
where nin and nout are the input and output dimensions of the matrix.
nn forwardprop.m
This function will take as inputs a cell array of weights Wi and biases b1 and
an array of n inputs X (given as an k × n matrix, where k is the number of
input parameters). Using these, it should compute the forward propagation
step of the neural network (note: this is also how your neural network will make
predictions once trained). The process for this is as follows:
(a) Compute the value of W1X + b1;
(b) Find the hyperbolic tangent (tanh) of the result of the previous step,
yielding the output Y1 (this is the activation step);
(c) Repeat the process for each additional layer, using the output of the pre￾vious layer as the input. For example, next we would calculate W2Y1 +b2,
etc.
This function should return a cell array that contains the values of the function
before applying the activation and after applying the activation. For example,
if your network has 5 layers, the returned cell array should be a 5 × 2 array.
nn backprop.m
This function will take as inputs the following:
2
ENGR 3H Midterm
(a) A cell array of weights W and biases B;
(b) A cell array of layer outputs Z and post-activation values A;
(c) A vector Y containing the output values you are training on;
(d) A matrix X of the input values you are training on.
The function should return a cell array of gradient values that is the same size
as the cell array of weights and biases. To compute the gradient for each layer,
follow these steps. We will compute the gradients iteratively, starting at the last
layer (the output layer), and working toward the first layer (the input layer).
First, we need to compute the derivative of the cost with respect to the layer’s
activation values A. For simplicity, we’re going to use mean squared error as
our loss metric:
L = (Alast t Y )2 , (2)
where Alast denotes the output of the final layer’s activation function. The
derivative with respect to A (we’ll refer to it as dA, then, is:
dA = 2 (Alast t Y ). (3)
Note this is only the value of dA for the output layer; the other layers will have
a different expression which we’ll get to in a moment. Using dA and A, we can
compute the derivative with respect to Z; using the chain rule and the derivative
of tanh, we find that this is equal to:
dZ = dA. ∗ 1 1 A.2 . (4)
Note the periods denoting elementwise operations for both the multiplication
and the squaring of A. With dZ computed, we can now address the value of
dA for the other layers; this is computed via the previous layer’s values. It is
calculated as:
dAk 1 = WTk dZk. (5)
With these values calculated, we have straightforward expressions for the deriva￾tives of W and b. These are given by:
dWk = dZkATk 1/length (Y ) (6)
and
dBk = sum (dZk, 2) /length (Y ). (7)
If we are at the first layer (k = 1), replace the previous layer’s activation in (6)
with the input X. 3
ENGR 3H Midterm
nn update.m
This function should take three inputs: the cell array containing the current
values for the weights and biases, the cell array containing the gradients, and
the learning rate. It should return a new cell array with the values updated via
the learning rate, for example:
Wnew = Wold + α ∗ dWold. (8)
nn evaluate.m
This function should take as inputs two row vectors, representing the predictions
and the actual outputs. It should return a number representing the accuracy
of the model. For training, we are using mean squared error, as defined above.
Your function should return the MSE (note that it should be a single, averaged
value).
nn main.m
This script will run through everything we just did, then save the resulting
cell array with the trained parameters. Your script should set a learning rate,
the number of epochs (iterations) to train for, and carry out the training loop.
Additionally, you should normalize your training data. After loading in the
data, find the mean µX and standard deviation σX of each input variable.
Then, subtract the mean and divide by the standard deviation:
Xnorm = X X µX σX . (9)
This causes the input to be centered at 0 and relatively evenly distributed to
either side of 0, which is beneficial for training. Note that when you get data that
you haven’t trained on, you will need to carry out the identical normalization,
so you should also store the values of µX and σX. 4
联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!