Machine Learning (Chapter 18): SVM - Formulation

 




Machine Learning (Chapter 18): SVM - Formulation

Introduction to Support Vector Machines (SVM)

Support Vector Machines (SVM) are powerful supervised learning models used for classification and regression tasks. The core idea of SVM is to find the optimal hyperplane that best separates the data into different classes. In this chapter, we'll explore the mathematical formulation of SVMs, discuss how they work, and implement a simple example using Python.

Mathematical Formulation

1. The SVM Problem

Consider a dataset with nn data points {(xi,yi)}i=1n\{(x_i, y_i)\}_{i=1}^n, where xiRdx_i \in \mathbb{R}^d represents the feature vector, and yi{1,1}y_i \in \{-1, 1\} is the class label. The goal of SVM is to find a hyperplane that maximizes the margin between the two classes.

2. Hyperplane Equation

A hyperplane in dd-dimensional space can be represented as:

wTx+b=0w^T x + b = 0

where wRdw \in \mathbb{R}^d is the weight vector, and bRb \in \mathbb{R} is the bias term.

3. Margin and Support Vectors

The margin is defined as the distance between the hyperplane and the closest data points from either class. To maximize the margin, we need to solve the following optimization problem:

Maximize 2w\text{Maximize } \frac{2}{\|w\|}

subject to the constraint:

yi(wTxi+b)1for all iy_i (w^T x_i + b) \geq 1 \quad \text{for all } i

4. Optimization Problem

The above problem can be formulated as a convex optimization problem. We can solve it using the Lagrangian multipliers. The Lagrangian function is:

L(w,b,α)=12w2i=1nαi[yi(wTxi+b)1]L(w, b, \alpha) = \frac{1}{2} \|w\|^2 - \sum_{i=1}^n \alpha_i [y_i (w^T x_i + b) - 1]

where αi0\alpha_i \geq 0 are the Lagrange multipliers. The dual problem is then:

Minimize 12i=1nj=1nαiαjyiyj(xiTxj)\text{Minimize } \frac{1}{2} \sum_{i=1}^n \sum_{j=1}^n \alpha_i \alpha_j y_i y_j (x_i^T x_j)

subject to:

i=1nαiyi=0\sum_{i=1}^n \alpha_i y_i = 0 αi0\alpha_i \geq 0

Python Implementation

Let's implement a simple example using Python's Scikit-learn library to apply SVM to a classification problem.

Example: SVM on the Iris Dataset

python:
import numpy as np from sklearn import datasets from sklearn import svm from sklearn.model_selection import train_test_split from sklearn.metrics import classification_report, accuracy_score import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler # Load the Iris dataset iris = datasets.load_iris() X = iris.data y = iris.target # We will use only two classes for simplicity X = X[y != 2] y = y[y != 2] # Split the dataset into training and test sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Standardize the features scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Create and train the SVM model clf = svm.SVC(kernel='linear', C=1.0) clf.fit(X_train, y_train) # Make predictions y_pred = clf.predict(X_test) # Print the classification report and accuracy print("Classification Report:\n", classification_report(y_test, y_pred)) print("Accuracy Score:", accuracy_score(y_test, y_pred)) # Plotting decision boundary def plot_decision_boundary(X, y, model): h = .02 # step size in the mesh x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, alpha=0.8) plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', marker='o') plt.xlabel('Feature 1') plt.ylabel('Feature 2') plt.title('SVM Decision Boundary') plt.show() # Plot decision boundary for the trained model plot_decision_boundary(X_test, y_test, clf)

Explanation of the Code

  1. Loading and Preparing Data: The Iris dataset is loaded, and only two classes are considered for simplicity.
  2. Splitting Data: The dataset is split into training and test sets.
  3. Standardization: Features are standardized to have mean 0 and variance 1.
  4. Training the Model: An SVM model with a linear kernel is trained on the data.
  5. Evaluation: The model's performance is evaluated using accuracy and classification report.
  6. Visualization: The decision boundary of the SVM is plotted to visualize the separation between classes.

Conclusion

Support Vector Machines are a robust and effective tool for classification problems. By understanding the mathematical formulation and implementing SVM in Python, you can leverage its power to solve various machine learning tasks. The example provided demonstrates a practical application of SVM and helps visualize how it separates data into different classes.

Comments

Popular posts from this blog

Machine Learning (Chapter 41): The ROC Curve

Machine Learning (Chapter 39): Bootstrapping & Cross-Validation

Machine Learning (Chapter 40): Class Evaluation Measures