5. Neural Networks and Machine Learning
ECO374H1
Department of Economics
Summer 2025
Artificial Neural Networks
I Artificial neural networks (ANNs) are models that allow complex nonlinear
relationships between the response variable and its predictors
I A neural network is composed of observed and unobserved random variables, called neurons (also called nodes), organized in layers
I The observed predictor variables form the "Input" layer, and the predictions form the "Output" layer
I Intermediate layers contain unobserved random variables (so-called ìhidden neuronsî)
Special Case: Linear Regression Model
I The simplest ANN with no hidden layers is equivalent to a linear regression:
I In the ANN notation, the formula for the fitted regression model is
y = a + w1x1 + w2x2 + w3x3 + w4x4
I The parameters wk attached to the predictors xk are called weights, and the intercept a is called a bias
Nonlinear Neural Networks
I Once we add intermediate layer(s) with hidden neurons and activation functions, the ANN becomes non-linear
I An example shown in the following figure is known as the feed-forward network (FFN)
I The weights wk ,j are selected in the ANN framework using a machine
learning algorithm that minimizes a loss function, such as the Mean Squared Error (MSE), or Sum of Squared Residuals (SSR)
I In the special case of linear regression, OLS provides an analytical solution to the learning algorithm that minimizes SSR
I In general ANNs the response variable is a nonlinear function of the predictors and hence OLS is not applicable
I A neural network with many hidden layers is called a Deep Neural Network (DNN) and its training algorithm is called Deep learning
Feed-Forward Network
I In FFN each layer of nodes receives inputs from the previous layers
I The inputs to each node are combined using a weighted linear combination
I The result is then modified by a nonlinear "activation" function before being output
I The outputs of the nodes in one layer are inputs to the next layer
I In the Figure above, the inputs (blue dots) into hidden neuron j are combined linearly as
I At each hidden neuron (green dot), a nonlinear activation function s(zj ) is applied, and the model prediction is then obtained as
Activation Function
I The activation function s(zj ) adds áexibility and complexity to the model
I Without the activation function the model would be limited to a linear combination of predictors (multiple regression)
I Popular activation functions are the logistic (or sigmoid) function
or the tanh function
Activation Function
FFN Model
I Consider the general case of:
I K predictors
I J hidden nodes in one hidden layer
I The functional form of zj from (1)
I logistic activation function (3)
I The FFN model can be then expressed as
I Note that the model is nonlinear in xi ,k due to the presence of the nonlinear activation function
ANN Model Formulation
I For building an ANN model, we need to pre-specify in advance:
I The number of hidden layers
I The number of nodes in each hidden layer
I The functional form of the activation function
I The parameters a , a1 , . . . , aJ and w1 , . . . , wJ and w11 , . . . , wKJ are ìlearnedî from the data
Training Neural Networks
I Training a network on data involves searching for the set of weights that best enable the network to model the patterns in the data
I Training (or learning) presents the network with data to modify the weights
I The goal of a learning algorithm is typically to minimize a loss function that quantiÖes the lack of Öt of the network to the data
Learning
I Supervised learning (our focus)
I We supply the ANN with inputs and outputs, as in the examples above
I The weights are modiÖed to reduce the di§erence between the predicted and actual outputs using a loss function
I Examples: NN (auto)regression, face, speech, or handwriting recognition, spam detection
I Unsupervised learning
I We supply the ANN with inputs only
I The ANN works only on the input values so that similar inputs create similar outputs
I Examples: K-means clustering, dimensionality reduction
Forward propagation in Supervised Learning of FFNs
I The process of forward propagation in FFNs involves:
I Computing zj , as in (1), at every hidden neuron j
I Applying the activation function s (zj) at each j , as in (3)
I Constructing a linear combination of s (zj) to obtain the predicted output
I Once the predicted output is obtained at the output layer, we compute the loss or "error" (predicted output minus the original output)
Backpropagation in Supervised Learning of FFNs
I The goal of backpropagation is to adjust the weights in each layer to minimize the overall error (loss) at the output layer
I One iteration of forward and backpropagation is called an epoch
I Typically, many epochs, (often tens of thousands) are required to train a neural network well