Machine Learning (Chapter 24): ANN II - Backpropagation I
Machine Learning (Chapter 24): ANN II - Backpropagation I
Introduction to Backpropagation
Backpropagation is a fundamental algorithm in training artificial neural networks (ANNs). It is the mechanism by which the network learns from the data by adjusting its weights to minimize the error between the predicted output and the actual target output. This process is crucial for enabling neural networks to model complex functions, making backpropagation a cornerstone of modern deep learning.
The Backpropagation Algorithm
The backpropagation algorithm consists of two main phases: the forward pass and the backward pass.
Forward Pass:
- During the forward pass, input data is fed into the network, and the output is calculated layer by layer until the final output is obtained.
- The network's prediction is compared with the actual target to calculate the loss, which measures how far off the prediction is.
Backward Pass:
- The backward pass involves calculating the gradient of the loss function with respect to each weight in the network.
- The weights are then updated in the opposite direction of the gradient to minimize the loss.
Mathematical Formulation
Consider a simple feedforward neural network with an input layer, one hidden layer, and an output layer. Let's denote:
- as the input vector
- and as the weight matrices for the hidden and output layers, respectively
- and as the bias vectors for the hidden and output layers, respectively
- and as the linear activations before applying the activation function
- and as the activations after applying the activation function
- as the true target vector
Forward Pass Equations
The forward pass can be expressed mathematically as:
Where is the activation function (e.g., sigmoid, ReLU).
The loss function is then computed, often using mean squared error for regression or cross-entropy for classification tasks:
Backward Pass Equations
The goal of the backward pass is to compute the gradients of the loss with respect to the weights and biases. Using the chain rule, these gradients are computed as:
- Compute the derivative of the loss with respect to the output layer's activation:
- Compute the gradient with respect to the weights and biases in the output layer:
\frac{\partial L}{\partial \mathbf{W}^{[2]}} = \delta^{[2]} \mathbf{A}^{[1]}^T
- Compute the error term for the hidden layer:
\delta^{[1]} = \mathbf{W}^{[2]}^T \delta^{[2]} \odot \sigma'(\mathbf{Z}^{[1]})
- Compute the gradient with respect to the weights and biases in the hidden layer:
Finally, the weights and biases are updated using gradient descent:
Where is the learning rate.
Python Implementation of Backpropagation
Below is a basic implementation of the backpropagation algorithm for a simple neural network with one hidden layer.
python:
import numpy as np
# Sigmoid activation function and its derivative
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(x):
return x * (1 - x)
# Input data (X) and target output (Y)
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
Y = np.array([[0], [1], [1], [0]])
# Initialize weights and biases
np.random.seed(42)
W1 = np.random.rand(2, 2)
b1 = np.random.rand(1, 2)
W2 = np.random.rand(2, 1)
b2 = np.random.rand(1, 1)
# Learning rate
alpha = 0.1
# Training loop
for epoch in range(10000):
# Forward pass
Z1 = np.dot(X, W1) + b1
A1 = sigmoid(Z1)
Z2 = np.dot(A1, W2) + b2
A2 = sigmoid(Z2)
# Compute loss (mean squared error)
loss = np.mean((Y - A2) ** 2)
# Backward pass
dA2 = A2 - Y
dZ2 = dA2 * sigmoid_derivative(A2)
dW2 = np.dot(A1.T, dZ2)
db2 = np.sum(dZ2, axis=0, keepdims=True)
dA1 = np.dot(dZ2, W2.T)
dZ1 = dA1 * sigmoid_derivative(A1)
dW1 = np.dot(X.T, dZ1)
db1 = np.sum(dZ1, axis=0, keepdims=True)
# Update weights and biases
W2 -= alpha * dW2
b2 -= alpha * db2
W1 -= alpha * dW1
b1 -= alpha * db1
if epoch % 1000 == 0:
print(f"Epoch {epoch}, Loss: {loss:.4f}")
# Final output after training
print("Final output after training:")
print(A2)
Conclusion
Backpropagation is a crucial technique for training neural networks, allowing the network to adjust its weights to minimize error through a gradient descent process. Understanding the mathematics behind backpropagation, as well as its implementation in Python, is essential for anyone looking to dive deeper into the world of machine learning and neural networks.

Comments
Post a Comment