Machine Learning (Chapter 18): SVM - Formulation

By Ritesh Sahu August 30, 2024

Machine Learning (Chapter 18): SVM - Formulation

Introduction to Support Vector Machines (SVM)

Support Vector Machines (SVM) are powerful supervised learning models used for classification and regression tasks. The core idea of SVM is to find the optimal hyperplane that best separates the data into different classes. In this chapter, we'll explore the mathematical formulation of SVMs, discuss how they work, and implement a simple example using Python.

Mathematical Formulation

1. The SVM Problem

Consider a dataset with $n$ data points $\{(x_i, y_i)\}_{i=1}^n$ , where $x_i \in \mathbb{R}^d$ represents the feature vector, and $y_i \in \{-1, 1\}$ is the class label. The goal of SVM is to find a hyperplane that maximizes the margin between the two classes.

2. Hyperplane Equation

A hyperplane in $d$ -dimensional space can be represented as:

$w^T x + b = 0$

where $w \in \mathbb{R}^d$ is the weight vector, and $b \in \mathbb{R}$ is the bias term.

3. Margin and Support Vectors

The margin is defined as the distance between the hyperplane and the closest data points from either class. To maximize the margin, we need to solve the following optimization problem:

$\text{Maximize } \frac{2}{\|w\|}$

subject to the constraint:

$y_i (w^T x_i + b) \geq 1 \quad \text{for all } i$

4. Optimization Problem

The above problem can be formulated as a convex optimization problem. We can solve it using the Lagrangian multipliers. The Lagrangian function is:

$L(w, b, \alpha) = \frac{1}{2} \|w\|^2 - \sum_{i=1}^n \alpha_i [y_i (w^T x_i + b) - 1]$

where $\alpha_i \geq 0$ are the Lagrange multipliers. The dual problem is then:

$\text{Minimize } \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n \alpha_i \alpha_j y_i y_j (x_i^T x_j)$

subject to:

$\sum_{i=1}^n \alpha_i y_i = 0$ $\alpha_i \geq 0$

Python Implementation

Let's implement a simple example using Python's Scikit-learn library to apply SVM to a classification problem.

Example: SVM on the Iris Dataset

python:
import numpy as np
from sklearn import datasets
from sklearn import svm
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Load the Iris dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# We will use only two classes for simplicity
X = X[y != 2]
y = y[y != 2]

# Split the dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Create and train the SVM model
clf = svm.SVC(kernel='linear', C=1.0)
clf.fit(X_train, y_train)

# Make predictions
y_pred = clf.predict(X_test)

# Print the classification report and accuracy
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Accuracy Score:", accuracy_score(y_test, y_pred))

# Plotting decision boundary
def plot_decision_boundary(X, y, model):
    h = .02  # step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)

    plt.contourf(xx, yy, Z, alpha=0.8)
    plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', marker='o')
    plt.xlabel('Feature 1')
    plt.ylabel('Feature 2')
    plt.title('SVM Decision Boundary')
    plt.show()

# Plot decision boundary for the trained model
plot_decision_boundary(X_test, y_test, clf)

Explanation of the Code

Loading and Preparing Data: The Iris dataset is loaded, and only two classes are considered for simplicity.
Splitting Data: The dataset is split into training and test sets.
Standardization: Features are standardized to have mean 0 and variance 1.
Training the Model: An SVM model with a linear kernel is trained on the data.
Evaluation: The model's performance is evaluated using accuracy and classification report.
Visualization: The decision boundary of the SVM is plotted to visualize the separation between classes.

Conclusion

Support Vector Machines are a robust and effective tool for classification problems. By understanding the mathematical formulation and implementing SVM in Python, you can leverage its power to solve various machine learning tasks. The example provided demonstrates a practical application of SVM and helps visualize how it separates data into different classes.

Search This Blog

Machine learning and artificial intelligence