Artificial Neural Networks

What are neural networks?

An artificial neural network (ANN) is a computational model inspired by the physiology of animal brains. Similar to the way neurons in the brain send signals to each other, an ANN consists of interconnected nodes (neurons) that map a certain input to a certain output or prediction.

A commonly used form of neural network is the multilayer perceptron (MLP), which contains multiple layers of neurons. As a form of artificial intelligence, MLPs are applied to a multitude of problems and fields, such as facial recognition, automatic translators, and self-driving cars. For problems like these, using conventional algorithms or computer programs is practically impossible. For example, it would take millions of lines of code and a sizable amount of effort to describe to a computer what a face looks like or how to adequately translate every word in a language. AI algorithms, meanwhile, get around this by learning from vast amounts of data in a process called training.

To illustrate this, we will use a common sample problem: predicting the species of a penguin (either Gentoo or Adelie) based on their bill length, flipper length, and body mass. For simplicity, this is be a binary classification problem, although more classes may be added if there is data for it.

scroll down ∨

Backpropagation: how neural networks learn

We previously established that neural networks "learn" by having their weights and biases optimized. In multilayer perceptrons, this is done using a process called backpropagation, which uses partial derivatives to minimize error.

The cost function

MLPs commonly use least mean squares to measure the error of a particular configuration of weights. The degree of error of output neuron \( j \) in predicting the \(n\)th datapoint in a training dataset is given by

\( \displaystyle \mathcal{E}(n)=\frac{1}{2}\sum_j \left[\hat{y}_j (n) - y_j (n) \right]^2 \)

Where \( \hat{y}_j \) is the expected value for neuron \(j\) in an MLP's output layer while \(y_j\) is its actual value. (Note that \(n\) is not being multiplied to either of the two; it just refers to the \(n\)th datapoint.)

For example, if the MLP depicted to the left is correct in predicting "Gentoo" for the \(n\)th datapoint in its training dataset, the expected values \( \hat{y}_j \) are 1 for the "Gentoo" neuron and 0 for the "Adelie" neuron. As such, its error is given by:

\( \displaystyle \mathcal{E}(n)=\frac{1}{2} \left[ (1 - 0.82)^2 + (0 - 0.42)^2 \right] \)

Gradient descent

To optimize an MLP's weights, we must measure how it changes with regard to its error. Here, the change \( \Delta w_{ji} (n) \) of the weight connecting neurons \( i \) and \( j \), where \( j \) is in the succeeding layer, is given by

\( \displaystyle \Delta w_{ji} (n) = -\eta\frac{\partial\mathcal{E}(n)}{\partial v_j(n)} y_i(n) \)

where \( v_j(n) \) is the sum of all the weights connected to neuron \( j \), \( y_i \) is the output of the previous neuron \( i \), and \( \eta \) is the learning rate, a constant selected by the user.

Since gradient descent is an iterative process, the weights are updated for every \(n\)th datapoint in the training dataset. This means that as training goes on, the MLP's error gradually converges to a local minimum, increasing its accuracy.

References

McGonagle, J., Shaikouski, G., Williams, C., Hsu, A., Khim, J., & Miller, A. (n.d.). Backpropagation. Brilliant.org. Retrieved April 6, 2022, from https://brilliant.org/wiki/backpropagation

Sanderson, Grant. (2017, Nov 3). Backpropagation calculus | Chapter 4, Deep learning [Video]. YouTube. https://youtu.be/tIeHLnjs5U8