Assignment 4 : L-Layer Deep Neural Network
Assignment:
1) Build a deep NN to recognize cat pictures (using the same dataset you have).
2) Estimate training and testing accuracies.
3) Use L=5; i.e., 4 hidden layers. Use number of hidden units 22,10,7,5
4) Use Relu activation for all hidden units and sigmoid for the output layer
5) Try again for L=7 (6 hidden layers); use # hidden units 30,22,10,7,5,3. Does
this help?
Details
Adapt same concepts explained in the previous assignments to an L-layer NN
You will adjust your helper functions (initialization, Forprop, Backprop) to be called multple times while
you are running a for loop over the layers
The output layer has a different structure than the other layers (different activation (sigmoid) and different
dimensions)
Remember, The input is a (64,64,3) image which is flattened to a vector of size (12288,1).
You take the sigmoid of the final linear unit. If it is greater than 0.5, you classify it to be a cat.
Equations:
1. Forward propagation:
Linear equation (where )
Activation
1. Backward propagation: The three outputs are computed using the input .
Here are the formulas you need:
If is the activation function, compute
.
In order to initialize backpropagation; you need derivative of : Use the following code to compute this
derivative
Recommended Steps
1. Store the dimesnions of the network in an array layer_dims. The len(layer_dims)= L+1 and it consists of
the number of inputs/hidden untis of each layer
2. Initialize the parameters for an -layer neural network.
3. Implement the "Forprop" module:
Complete the "linear part" of a layer's forward propagation step (resulting in ).
Use activation function relu for all layers except the output layer, use sigmoid.
Combine the previous two steps into a new "linear-activation" forward function.
Stack the "linear-Relu" forward function L-1 time and add a "linear-sigmoid" at the end (for the
output layer ). This gives you a new "deep-model_forward" function.
4. Compute the loss.
5. Implement the "Backprop" module:
Complete the "linear" part of a layer's backward propagation step.
Use the derivative of relu and sigmoid accordingly (have separate functions for these)
Combine the previous two steps into a new "linear-Activation" backward function.
Start with "linear-sigmoid" backward and then Stack "linear-Relu" backward L-1 times in a new
"deep-model_backward" function.
6. Merge all the functions above in an L-layer-Model (To train your model)
7. Finally update the parameters; and compute accuracies.
Recommended Functions:
1. Initialization: The initialization for a deeper L-layer neural network requires a for loop over the layers.
You should make sure that your dimensions match between each layer. Use random initialization for the
weight matrices. Use np.random.randn(shape) * 0.01. Use zeros initialization for the biases. Use
np.zeros(shape)
I/p:
layer_dims -- python array containing the dimensions of each layer (including the input layer); of
length L+1
O/p:
parameters -- python dictionary containing your initialized parameters "W1", "b1", ..., "WL", "bL":
Wl -- weight matrix of shape (layer_dims[l], layer_dims[l-1])
bl -- bias vector of shape (layer_dims[l], 1)
1. Implement the forward propagation (3 functions)
Function 1: Implement the linear part of a layer's forward propagation.
Inputs:
A -- activations from previous layer (or input data)
W -- weights matrix of shape (size of current layer x size of previous layer)
b -- bias vector of shape. (size of the current layer, 1)
Outputs:
Z -- the input of the activation function
cache -- dictionary containing "A", "W" and "b"
Function 2 Implement the activation part of a layer's forward propagation
Inputs:
dAL = −(np. divide(Y , AL) − np. divide(1 − Y , 1 − AL))
L
Z
[l]
L
11/1/2020 L-Layer_deep-NN
https://blackboard.udmercy.edu/bbcswebdav/pid-1563288-dt-content-rid-25593194_1/courses/17052_ELEE5940-02_2021/L-Layer_deep-NN.html 4/6
A_prev -- activations from previous layer (or input data): (size of previous layer, number of
examples)
W -- weights matrix: numpy array of shape (size of current layer, size of previous layer)
b -- bias vector, numpy array of shape (size of the current layer, 1)
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
Outputs:
A -- the output of the activation function
cache -- a python dictionary containing "linear_cache (A_prev, W, b)" and "activation_cache
(Z)"; stored for computing the backward pass efficiently
Function 3 Combine the two functions above together. Implement forward propagation for the "linearrelu"*(L-1)
and one final "linear-sigmoid" computation
Inputs:
X -- data (train or test)
parameters -- output of initialize_parameters function
Outputs:
AL -- last activation output
caches -- list of caches containing: every cache of linear_activation_forward() (there are L of
them; indexed by the layer index)
1. Compute cost
1. Implement Backpropagation: Compute the gradient of the loss function with respect to the network
parameters. (Again 3 functions)
Function 1: Implement the linear portion of backward propagation for a single layer. In particular,
Suppose you have already calculated the derivative . You want to get .
Inputs:
dZ -- Gradient of the cost with respect to the linear output (of current layer l)
cache -- (A_prev, W, b (linear cache), Z(activation cache)); from forward propagation in the
current layer
Outputs:
dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same
shape as A_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
Function 2: Remember, If is the activation function, we compute Here,
we want to implement the backward propagation for the "linear-activation" layer. is sigmoid for the
final layer and relu for the other L-1 hidden layers.
Inputs:
dA -- (post)activation gradient for current layer l
cache -- tuple of values (linear_cache, activation_cache)
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
Outputs:
dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same
shape as A_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
dZ
[l] (dW , d d )
[l] b
[l] A[l−1]
g dZ = d ∗ ( )
[l] A[l] g
′ Z
[l]
g
11/1/2020 L-Layer_deep-NN
https://blackboard.udmercy.edu/bbcswebdav/pid-1563288-dt-content-rid-25593194_1/courses/17052_ELEE5940-02_2021/L-Layer_deep-NN.html 5/6
Function 3 Now you implement the backward function for the whole network. Remmeber; when you
implemented the forward function, at each iteration, you stored a cache which contains (A_l-1,W,b,
and Z). In the back propagation, you will use those variables to compute the gradients. Therefore, in
the backward function, you will iterate through all the hidden layers backward, starting from layer .
On each step, you will use the cached values for layer to backpropagate through layer . To sum
up, you need to (i) Initialize backward propagation (comput dAL); (ii) implement the backward
propagation for the Lth layer "linear-sigmoid"; and then implement "linear-relu" * (L-1)
Inputs:
AL -- output of the forward propagation of last layer (L_model_forward())
Y -- true "label" vector
caches -- list of caches containing:
every cache of linear_activation_forward() with "relu" (it's cache
s[l], for l in range(L-1) i.e l = 0...L-2)
the cache of linear_activation_forward() with "sigmoid" (it's cach
es[L-1])
Outputs:
grads["db"+ str(l)] = ... for l=1,2,...L
1. Update parameters using gradient descent update rule (use forloop over the layers).
Inputs:
parameters -- python dictionary containing your parameters
learning rate
Outputs:
parameters -- python dictionary containing your updated parameters
parameters["W" + str(l)] = ...
parameters["b" + str(l)] = ... for l=1,2,..L
1. L-layer NN Implements a L-layer neural network:
Inputs:
X -- data, (specifically train data examples matrix)
Y -- true "label" vector (train data)
layers_dims
learning_rate -- learning rate of GD
num_iterations -- number of iterations of GD
Outputs:
parameters -- parameters learnt by the model. Those can then be used to predict (and to
compute test/train error).
1. Prediction This function should predict the results of a L-layer neural network.
Inputs:
X -- data set of examples you would like to label (train or test)
parameters -- parameters of the trained model
Outputs:
Y_predicted -- predictions for the given dataset X
L
l l
11/1/2020 L-Layer_deep-NN
https://blackboard.udmercy.edu/bbcswebdav/pid-1563288-dt-content-rid-25593194_1/courses/17052_ELEE5940-02_2021/L-Layer_deep-NN.html 6/6
Accuracy printed
In [ ]: