Machine Learning (Chapter 26): ANN IV - Initialization, Training & Validation
Machine Learning (Chapter 26): ANN IV - Initialization, Training & Validation
Introduction
In the previous chapters, we explored the fundamental concepts and mechanisms behind Artificial Neural Networks (ANNs), including early models, backpropagation, and its extensions. In this chapter, we'll dive into the crucial steps of initializing, training, and validating ANNs. Proper initialization, effective training techniques, and rigorous validation methods are key to building neural networks that generalize well and perform effectively on unseen data.
1. Initialization of Weights
The initialization of weights in a neural network significantly influences the training process. Poor initialization can lead to slow convergence or even prevent the network from converging at all. Two popular initialization methods are Xavier Initialization and He Initialization.
1.1 Xavier Initialization
Xavier initialization (also known as Glorot initialization) is designed to keep the scale of the gradients roughly the same across all layers. This method works well with activation functions like sigmoid and tanh.
Mathematical Formula:
For a layer with input units and output units, the weights are initialized as:
Where denotes a uniform distribution.
1.2 He Initialization
He initialization is similar to Xavier but is optimized for ReLU and its variants, which do not have a vanishing gradient problem like sigmoid or tanh.
Mathematical Formula:
For a layer with input units, the weights are initialized as:
Where denotes a normal distribution.
1.3 Python Implementation
python:
import numpy as np
def xavier_init(size, n_in, n_out):
limit = np.sqrt(6 / (n_in + n_out))
return np.random.uniform(-limit, limit, size=size)
def he_init(size, n_in):
stddev = np.sqrt(2 / n_in)
return np.random.normal(0, stddev, size=size)
# Example usage:
layer_dims = [784, 128, 64, 10] # Example dimensions for a 3-layer neural network
weights_xavier = [xavier_init((layer_dims[i], layer_dims[i+1]), layer_dims[i], layer_dims[i+1]) for i in range(len(layer_dims)-1)]
weights_he = [he_init((layer_dims[i], layer_dims[i+1]), layer_dims[i]) for i in range(len(layer_dims)-1)]
2. Training Process
The training process of a neural network involves updating the weights to minimize the loss function. This process is typically done using Stochastic Gradient Descent (SGD) and its variants like Adam.
2.1 Loss Function
The loss function measures how well the network's predictions match the actual data. For classification problems, Cross-Entropy Loss is commonly used, while Mean Squared Error (MSE) is used for regression tasks.
Mathematical Formula:
For cross-entropy loss:
For mean squared error:
Where:
- is the number of samples
- is the actual label
- is the predicted label
2.2 Gradient Descent
The gradients of the loss function with respect to the weights are computed, and the weights are updated accordingly.
Mathematical Formula:
For a weight , the update rule is:
Where:
- is the learning rate
- is the gradient of the loss function with respect to
2.3 Python Implementation
python:
def sgd_update(weights, gradients, learning_rate):
return [w - learning_rate * g for w, g in zip(weights, gradients)]
def adam_update(weights, gradients, learning_rate, beta1, beta2, eps, t, m, v):
m = [beta1 * mi + (1 - beta1) * gi for mi, gi in zip(m, gradients)]
v = [beta2 * vi + (1 - beta2) * (gi ** 2) for vi, gi in zip(v, gradients)]
m_hat = [mi / (1 - beta1 ** t) for mi in m]
v_hat = [vi / (1 - beta2 ** t) for vi in v]
weights = [w - learning_rate * mhi / (np.sqrt(vhi) + eps) for w, mhi, vhi in zip(weights, m_hat, v_hat)]
return weights, m, v
# Example usage:
weights = [np.random.randn(*w.shape) for w in weights_xavier]
gradients = [np.random.randn(*w.shape) for w in weights]
learning_rate = 0.01
# SGD Update
weights = sgd_update(weights, gradients, learning_rate)
# Adam Update
beta1, beta2, eps = 0.9, 0.999, 1e-8
t = 1
m, v = [np.zeros_like(w) for w in weights], [np.zeros_like(w) for w in weights]
weights, m, v = adam_update(weights, gradients, learning_rate, beta1, beta2, eps, t, m, v)
3. Validation
Validation is essential for assessing the model's performance on unseen data, preventing overfitting, and fine-tuning hyperparameters. The dataset is typically split into training, validation, and test sets.
3.1 Train-Validation Split
A common approach is to use 80% of the data for training and 20% for validation.
python:
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)
3.2 Early Stopping
Early stopping is a technique to halt training when the validation loss stops decreasing, indicating that the model may be overfitting.
python:
min_val_loss = float('inf')
patience, patience_counter = 10, 0
for epoch in range(epochs):
# Training step
# ...
# Validation step
val_loss = compute_loss(X_val, y_val)
if val_loss < min_val_loss:
min_val_loss = val_loss
best_weights = weights
patience_counter = 0
else:
patience_counter += 1
if patience_counter >= patience:
print("Early stopping...")
break
weights = best_weights
Conclusion
Initialization, training, and validation are crucial steps in developing effective neural networks. Proper initialization sets the foundation for efficient learning, while careful training and validation ensure that the model generalizes well to unseen data. By understanding and applying these techniques, one can significantly improve the performance of ANNs and build robust machine learning models.

Comments
Post a Comment