Machine Learning (Chapter 21): SVM Kernels

By Ritesh Sahu August 30, 2024

Machine Learning (Chapter 21): SVM Kernels

Support Vector Machines (SVMs) are a powerful class of supervised learning algorithms used for classification and regression tasks. A key feature of SVMs is their ability to handle non-linear decision boundaries through the use of kernels. In this chapter, we will explore SVM kernels, delve into their mathematical foundations, and provide Python code examples to illustrate their application.

1. Introduction to SVM Kernels

SVMs aim to find a hyperplane that best separates classes in the feature space. When the data is not linearly separable in its original space, SVM kernels transform the data into a higher-dimensional space where a linear separator can be applied. This transformation is achieved implicitly using kernel functions, which allow SVMs to learn complex decision boundaries without explicitly computing the coordinates in the higher-dimensional space.

2. Mathematical Formulation

The decision function of a Support Vector Machine is given by:

$f(x) = \text{sign} \left( \sum_{i=1}^{N} \alpha_i y_i K(x, x_i) + b \right)$

where:

$\alpha_i$ are the Lagrange multipliers,
$y_i$ are the class labels,
$x_i$ are the support vectors,
$b$ is the bias term,
$K(x, x_i)$ is the kernel function.

3. Popular Kernel Functions

Linear Kernel:
The linear kernel is the simplest form of a kernel function and is given by:
$K(x, x_i) = x \cdot x_i$
This corresponds to no transformation and is used for linearly separable data.
Polynomial Kernel:
The polynomial kernel allows for learning non-linear boundaries by using polynomial functions:
$K(x, x_i) = (x \cdot x_i + c)^d$
where $c$ is a constant and $d$ is the polynomial degree.
Radial Basis Function (RBF) Kernel:
The RBF kernel, also known as the Gaussian kernel, is widely used for its flexibility:
$K(x, x_i) = \exp \left( -\frac{\|x - x_i\|^2}{2\sigma^2} \right)$
where $\sigma$ is the kernel width.
Sigmoid Kernel:
The sigmoid kernel is inspired by neural networks and is defined as:
$K(x, x_i) = \tanh \left( \alpha (x \cdot x_i) + c \right)$
where $\alpha$ and $c$ are kernel parameters.

4. Python Code Examples

Let's see how these kernels can be used with Python's scikit-learn library.

python:
import numpy as np
from sklearn import datasets
from sklearn.svm import SVC
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Load dataset
iris = datasets.load_iris()
X = iris.data
y = iris.target

# For simplicity, use only two classes
X = X[y != 2]
y = y[y != 2]

# Split the dataset
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

# Linear Kernel
clf_linear = SVC(kernel='linear')
clf_linear.fit(X_train, y_train)
y_pred_linear = clf_linear.predict(X_test)
print("Linear Kernel Accuracy:", accuracy_score(y_test, y_pred_linear))

# Polynomial Kernel
clf_poly = SVC(kernel='poly', degree=3, coef0=1)
clf_poly.fit(X_train, y_train)
y_pred_poly = clf_poly.predict(X_test)
print("Polynomial Kernel Accuracy:", accuracy_score(y_test, y_pred_poly))

# RBF Kernel
clf_rbf = SVC(kernel='rbf', gamma='scale')
clf_rbf.fit(X_train, y_train)
y_pred_rbf = clf_rbf.predict(X_test)
print("RBF Kernel Accuracy:", accuracy_score(y_test, y_pred_rbf))

# Sigmoid Kernel
clf_sigmoid = SVC(kernel='sigmoid', gamma='scale', coef0=1)
clf_sigmoid.fit(X_train, y_train)
y_pred_sigmoid = clf_sigmoid.predict(X_test)
print("Sigmoid Kernel Accuracy:", accuracy_score(y_test, y_pred_sigmoid))

# Plot decision boundaries
def plot_decision_boundary(clf, X, y, title):
    h = .02  # step size in the mesh
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
                         np.arange(y_min, y_max, h))
    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, alpha=0.3)
    plt.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor='k')
    plt.title(title)

# Plot decision boundaries
plt.figure(figsize=(12, 8))
plt.subplot(2, 2, 1)
plot_decision_boundary(clf_linear, X_test, y_test, "Linear Kernel")

plt.subplot(2, 2, 2)
plot_decision_boundary(clf_poly, X_test, y_test, "Polynomial Kernel")

plt.subplot(2, 2, 3)
plot_decision_boundary(clf_rbf, X_test, y_test, "RBF Kernel")

plt.subplot(2, 2, 4)
plot_decision_boundary(clf_sigmoid, X_test, y_test, "Sigmoid Kernel")

plt.show()

5. Explanation of the Code

Data Preparation: The Iris dataset is loaded and filtered to contain only two classes for simplicity. The data is then split into training and testing sets and standardized.
Model Training: Four SVM classifiers with different kernels (linear, polynomial, RBF, and sigmoid) are trained on the data.
Model Evaluation: The accuracy of each kernel is evaluated on the test set.
Visualization: Decision boundaries for each kernel are plotted to visualize how each kernel performs.

6. Conclusion

SVM kernels are a crucial component for handling non-linear data by mapping it into higher-dimensional spaces where a linear decision boundary can be applied. Understanding and choosing the right kernel is essential for building effective SVM models. By experimenting with different kernels, as demonstrated in the Python code, you can tailor your SVM to best fit the characteristics of your dataset.

Feel free to modify the parameters and datasets to explore how different kernels affect the performance of SVM models in various scenarios.

Search This Blog

Machine learning and artificial intelligence