Machine Learning (Chapter 21): SVM Kernels


 

Machine Learning (Chapter 21): SVM Kernels

Support Vector Machines (SVMs) are a powerful class of supervised learning algorithms used for classification and regression tasks. A key feature of SVMs is their ability to handle non-linear decision boundaries through the use of kernels. In this chapter, we will explore SVM kernels, delve into their mathematical foundations, and provide Python code examples to illustrate their application.

1. Introduction to SVM Kernels

SVMs aim to find a hyperplane that best separates classes in the feature space. When the data is not linearly separable in its original space, SVM kernels transform the data into a higher-dimensional space where a linear separator can be applied. This transformation is achieved implicitly using kernel functions, which allow SVMs to learn complex decision boundaries without explicitly computing the coordinates in the higher-dimensional space.

2. Mathematical Formulation

The decision function of a Support Vector Machine is given by:

f(x)=sign(i=1NαiyiK(x,xi)+b)f(x) = \text{sign} \left( \sum_{i=1}^{N} \alpha_i y_i K(x, x_i) + b \right)

where:

  • αi\alpha_i are the Lagrange multipliers,
  • yiy_i are the class labels,
  • xix_i are the support vectors,
  • bb is the bias term,
  • K(x,xi)K(x, x_i) is the kernel function.

3. Popular Kernel Functions

  1. Linear Kernel:

    The linear kernel is the simplest form of a kernel function and is given by:

    K(x,xi)=xxiK(x, x_i) = x \cdot x_i

    This corresponds to no transformation and is used for linearly separable data.

  2. Polynomial Kernel:

    The polynomial kernel allows for learning non-linear boundaries by using polynomial functions:

    K(x,xi)=(xxi+c)dK(x, x_i) = (x \cdot x_i + c)^d

    where cc is a constant and dd is the polynomial degree.

  3. Radial Basis Function (RBF) Kernel:

    The RBF kernel, also known as the Gaussian kernel, is widely used for its flexibility:

    K(x,xi)=exp(xxi22σ2)K(x, x_i) = \exp \left( -\frac{\|x - x_i\|^2}{2\sigma^2} \right)

    where σ\sigma is the kernel width.

  4. Sigmoid Kernel:

    The sigmoid kernel is inspired by neural networks and is defined as:

    K(x,xi)=tanh(α(xxi)+c)K(x, x_i) = \tanh \left( \alpha (x \cdot x_i) + c \right)

    where α\alpha and cc are kernel parameters.

4. Python Code Examples

Let's see how these kernels can be used with Python's scikit-learn library.

python:

import numpy as np from sklearn import datasets from sklearn.svm import SVC from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score import matplotlib.pyplot as plt from sklearn.preprocessing import StandardScaler # Load dataset iris = datasets.load_iris() X = iris.data y = iris.target # For simplicity, use only two classes X = X[y != 2] y = y[y != 2] # Split the dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42) # Standardize features scaler = StandardScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) # Linear Kernel clf_linear = SVC(kernel='linear') clf_linear.fit(X_train, y_train) y_pred_linear = clf_linear.predict(X_test) print("Linear Kernel Accuracy:", accuracy_score(y_test, y_pred_linear)) # Polynomial Kernel clf_poly = SVC(kernel='poly', degree=3, coef0=1) clf_poly.fit(X_train, y_train) y_pred_poly = clf_poly.predict(X_test) print("Polynomial Kernel Accuracy:", accuracy_score(y_test, y_pred_poly)) # RBF Kernel clf_rbf = SVC(kernel='rbf', gamma='scale') clf_rbf.fit(X_train, y_train) y_pred_rbf = clf_rbf.predict(X_test) print("RBF Kernel Accuracy:", accuracy_score(y_test, y_pred_rbf)) # Sigmoid Kernel clf_sigmoid = SVC(kernel='sigmoid', gamma='scale', coef0=1) clf_sigmoid.fit(X_train, y_train) y_pred_sigmoid = clf_sigmoid.predict(X_test) print("Sigmoid Kernel Accuracy:", accuracy_score(y_test, y_pred_sigmoid)) # Plot decision boundaries def plot_decision_boundary(clf, X, y, title): h = .02 # step size in the mesh x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) Z = clf.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, alpha=0.3) plt.scatter(X[:, 0], X[:, 1], c=y, s=20, edgecolor='k') plt.title(title) # Plot decision boundaries plt.figure(figsize=(12, 8)) plt.subplot(2, 2, 1) plot_decision_boundary(clf_linear, X_test, y_test, "Linear Kernel") plt.subplot(2, 2, 2) plot_decision_boundary(clf_poly, X_test, y_test, "Polynomial Kernel") plt.subplot(2, 2, 3) plot_decision_boundary(clf_rbf, X_test, y_test, "RBF Kernel") plt.subplot(2, 2, 4) plot_decision_boundary(clf_sigmoid, X_test, y_test, "Sigmoid Kernel") plt.show()

5. Explanation of the Code

  • Data Preparation: The Iris dataset is loaded and filtered to contain only two classes for simplicity. The data is then split into training and testing sets and standardized.
  • Model Training: Four SVM classifiers with different kernels (linear, polynomial, RBF, and sigmoid) are trained on the data.
  • Model Evaluation: The accuracy of each kernel is evaluated on the test set.
  • Visualization: Decision boundaries for each kernel are plotted to visualize how each kernel performs.

6. Conclusion

SVM kernels are a crucial component for handling non-linear data by mapping it into higher-dimensional spaces where a linear decision boundary can be applied. Understanding and choosing the right kernel is essential for building effective SVM models. By experimenting with different kernels, as demonstrated in the Python code, you can tailor your SVM to best fit the characteristics of your dataset.

Feel free to modify the parameters and datasets to explore how different kernels affect the performance of SVM models in various scenarios.

Comments

Popular posts from this blog

Machine Learning (Chapter 41): The ROC Curve

Machine Learning (Chapter 39): Bootstrapping & Cross-Validation

Machine Learning (Chapter 40): Class Evaluation Measures