Machine Learning (Chapter 24): ANN II - Backpropagation I

 



Machine Learning (Chapter 24): ANN II - Backpropagation I

Introduction to Backpropagation

Backpropagation is a fundamental algorithm in training artificial neural networks (ANNs). It is the mechanism by which the network learns from the data by adjusting its weights to minimize the error between the predicted output and the actual target output. This process is crucial for enabling neural networks to model complex functions, making backpropagation a cornerstone of modern deep learning.

The Backpropagation Algorithm

The backpropagation algorithm consists of two main phases: the forward pass and the backward pass.

  1. Forward Pass:

    • During the forward pass, input data is fed into the network, and the output is calculated layer by layer until the final output is obtained.
    • The network's prediction is compared with the actual target to calculate the loss, which measures how far off the prediction is.
  2. Backward Pass:

    • The backward pass involves calculating the gradient of the loss function with respect to each weight in the network.
    • The weights are then updated in the opposite direction of the gradient to minimize the loss.

Mathematical Formulation

Consider a simple feedforward neural network with an input layer, one hidden layer, and an output layer. Let's denote:

  • X\mathbf{X} as the input vector
  • W[1]\mathbf{W}^{[1]} and W[2]\mathbf{W}^{[2]} as the weight matrices for the hidden and output layers, respectively
  • b[1]\mathbf{b}^{[1]} and b[2]\mathbf{b}^{[2]} as the bias vectors for the hidden and output layers, respectively
  • Z[1]\mathbf{Z}^{[1]} and Z[2]\mathbf{Z}^{[2]} as the linear activations before applying the activation function
  • A[1]\mathbf{A}^{[1]} and A[2]\mathbf{A}^{[2]} as the activations after applying the activation function
  • Y\mathbf{Y} as the true target vector

Forward Pass Equations

The forward pass can be expressed mathematically as:

Z[1]=W[1]X+b[1]\mathbf{Z}^{[1]} = \mathbf{W}^{[1]} \mathbf{X} + \mathbf{b}^{[1]} A[1]=σ(Z[1])\mathbf{A}^{[1]} = \sigma(\mathbf{Z}^{[1]}) Z[2]=W[2]A[1]+b[2]\mathbf{Z}^{[2]} = \mathbf{W}^{[2]} \mathbf{A}^{[1]} + \mathbf{b}^{[2]} A[2]=σ(Z[2])\mathbf{A}^{[2]} = \sigma(\mathbf{Z}^{[2]})

Where σ()\sigma(\cdot) is the activation function (e.g., sigmoid, ReLU).

The loss function LL is then computed, often using mean squared error for regression or cross-entropy for classification tasks:

L(Y,A[2])=1mi=1mLoss(Yi,Ai[2])L(\mathbf{Y}, \mathbf{A}^{[2]}) = \frac{1}{m} \sum_{i=1}^{m} \text{Loss}(\mathbf{Y}_i, \mathbf{A}^{[2]}_i)

Backward Pass Equations

The goal of the backward pass is to compute the gradients of the loss with respect to the weights and biases. Using the chain rule, these gradients are computed as:

  1. Compute the derivative of the loss with respect to the output layer's activation:

δ[2]=LA[2]σ(Z[2])\delta^{[2]} = \frac{\partial L}{\partial \mathbf{A}^{[2]}} \odot \sigma'(\mathbf{Z}^{[2]})

  1. Compute the gradient with respect to the weights and biases in the output layer:

\frac{\partial L}{\partial \mathbf{W}^{[2]}} = \delta^{[2]} \mathbf{A}^{[1]}^T Lb[2]=i=1mδi[2]\frac{\partial L}{\partial \mathbf{b}^{[2]}} = \sum_{i=1}^{m} \delta^{[2]}_i

  1. Compute the error term for the hidden layer:

\delta^{[1]} = \mathbf{W}^{[2]}^T \delta^{[2]} \odot \sigma'(\mathbf{Z}^{[1]})

  1. Compute the gradient with respect to the weights and biases in the hidden layer:

LW[1]=δ[1]XT\frac{\partial L}{\partial \mathbf{W}^{[1]}} = \delta^{[1]} \mathbf{X}^T Lb[1]=i=1mδi[1]\frac{\partial L}{\partial \mathbf{b}^{[1]}} = \sum_{i=1}^{m} \delta^{[1]}_i

Finally, the weights and biases are updated using gradient descent:

W[l]=W[l]αLW[l]\mathbf{W}^{[l]} = \mathbf{W}^{[l]} - \alpha \frac{\partial L}{\partial \mathbf{W}^{[l]}} b[l]=b[l]αLb[l]\mathbf{b}^{[l]} = \mathbf{b}^{[l]} - \alpha \frac{\partial L}{\partial \mathbf{b}^{[l]}}

Where α\alpha is the learning rate.

Python Implementation of Backpropagation

Below is a basic implementation of the backpropagation algorithm for a simple neural network with one hidden layer.

python:

import numpy as np # Sigmoid activation function and its derivative def sigmoid(x): return 1 / (1 + np.exp(-x)) def sigmoid_derivative(x): return x * (1 - x) # Input data (X) and target output (Y) X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) Y = np.array([[0], [1], [1], [0]]) # Initialize weights and biases np.random.seed(42) W1 = np.random.rand(2, 2) b1 = np.random.rand(1, 2) W2 = np.random.rand(2, 1) b2 = np.random.rand(1, 1) # Learning rate alpha = 0.1 # Training loop for epoch in range(10000): # Forward pass Z1 = np.dot(X, W1) + b1 A1 = sigmoid(Z1) Z2 = np.dot(A1, W2) + b2 A2 = sigmoid(Z2) # Compute loss (mean squared error) loss = np.mean((Y - A2) ** 2) # Backward pass dA2 = A2 - Y dZ2 = dA2 * sigmoid_derivative(A2) dW2 = np.dot(A1.T, dZ2) db2 = np.sum(dZ2, axis=0, keepdims=True) dA1 = np.dot(dZ2, W2.T) dZ1 = dA1 * sigmoid_derivative(A1) dW1 = np.dot(X.T, dZ1) db1 = np.sum(dZ1, axis=0, keepdims=True) # Update weights and biases W2 -= alpha * dW2 b2 -= alpha * db2 W1 -= alpha * dW1 b1 -= alpha * db1 if epoch % 1000 == 0: print(f"Epoch {epoch}, Loss: {loss:.4f}") # Final output after training print("Final output after training:") print(A2)

Conclusion

Backpropagation is a crucial technique for training neural networks, allowing the network to adjust its weights to minimize error through a gradient descent process. Understanding the mathematics behind backpropagation, as well as its implementation in Python, is essential for anyone looking to dive deeper into the world of machine learning and neural networks.

Comments

Popular posts from this blog

Machine Learning (Chapter 41): The ROC Curve

Machine Learning (Chapter 39): Bootstrapping & Cross-Validation

Machine Learning (Chapter 40): Class Evaluation Measures