Machine Learning (Chapter 16): Linear Discriminant Analysis (LDA)


 


Machine Learning (Chapter 16): Linear Discriminant Analysis (LDA)

Linear Discriminant Analysis (LDA) is a powerful technique in machine learning used for classification and dimensionality reduction. It is especially useful when dealing with datasets where you have multiple classes and you want to reduce the number of features while preserving as much of the class discriminatory information as possible.

Mathematical Formulation

The core idea behind LDA is to find a linear combination of features that best separates the classes. The goal is to project the data onto a lower-dimensional space where the classes are as distinct as possible.

Let's break down the mathematics behind LDA:

  1. Compute the Within-Class Scatter Matrix SWS_W: For each class kk, compute the scatter matrix SWkS_{Wk}:

    SWk=xDk(xμk)(xμk)TS_{Wk} = \sum_{x \in D_k} (x - \mu_k)(x - \mu_k)^T

    where DkD_k is the set of data points in class kk, and μk\mu_k is the mean vector of class kk.

    The total within-class scatter matrix is:

    SW=k=1KSWkS_W = \sum_{k=1}^{K} S_{Wk}

    where KK is the number of classes.

  2. Compute the Between-Class Scatter Matrix SBS_B: Compute the scatter matrix between classes as:

    SB=k=1Knk(μkμ)(μkμ)TS_B = \sum_{k=1}^{K} n_k (\mu_k - \mu)(\mu_k - \mu)^T

    where nkn_k is the number of samples in class kk, μk\mu_k is the mean vector of class kk, and μ\mu is the overall mean vector of all classes.

  3. Solve the Generalized Eigenvalue Problem: To find the linear discriminants, solve the eigenvalue problem:

    SW1SBw=λwS_W^{-1} S_B \mathbf{w} = \lambda \mathbf{w}

    where w\mathbf{w} represents the eigenvectors (discriminant vectors) and λ\lambda are the corresponding eigenvalues.

  4. Project the Data: Use the top dd eigenvectors (where dd is the number of classes minus one) to project the data onto a dd-dimensional space:

    Y=XWY = X \mathbf{W}

    where XX is the matrix of input features, and W\mathbf{W} is the matrix of top eigenvectors.

Example in Python

Let's apply LDA to a simple dataset using Python. We will use the Iris dataset for demonstration purposes.

python:
import numpy as np from sklearn.datasets import load_iris from sklearn.discriminant_analysis import LinearDiscriminantAnalysis from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score # Load the Iris dataset data = load_iris() X = data.data y = data.target # Standardize features scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # Split data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X_scaled, y, test_size=0.3, random_state=42) # Apply LDA lda = LinearDiscriminantAnalysis(n_components=2) X_train_lda = lda.fit_transform(X_train, y_train) X_test_lda = lda.transform(X_test) # Train a classifier from sklearn.linear_model import LogisticRegression clf = LogisticRegression() clf.fit(X_train_lda, y_train) # Predict and evaluate y_pred = clf.predict(X_test_lda) accuracy = accuracy_score(y_test, y_pred) print(f'Accuracy: {accuracy:.2f}') # Plot the results import matplotlib.pyplot as plt plt.figure(figsize=(8, 6)) colors = ['navy', 'turquoise', 'darkorange'] for color, i, target_name in zip(colors, [0, 1, 2], data.target_names): plt.scatter(X_train_lda[y_train == i, 0], X_train_lda[y_train == i, 1], color=color, alpha=.8, label=target_name) plt.xlabel('LD1') plt.ylabel('LD2') plt.title('LDA: Iris dataset') plt.legend(loc='best', shadow=False, scatterpoints=1) plt.show()

Explanation of the Code

  1. Data Loading and Preprocessing:

    • We load the Iris dataset and standardize the features to have zero mean and unit variance.
  2. Splitting the Data:

    • The dataset is split into training and testing sets.
  3. Applying LDA:

    • We apply LDA to reduce the dimensionality to 2 components.
  4. Training a Classifier:

    • A logistic regression classifier is trained on the reduced feature set.
  5. Evaluation:

    • We evaluate the accuracy of the classifier on the test set.
  6. Plotting:

    • We visualize the results to see how well LDA separates the classes in the reduced dimensional space.

By applying LDA, we can achieve a lower-dimensional representation of the data while maintaining class separability, which is useful for both visualization and improving the performance of machine learning models.

Comments

Popular posts from this blog

Machine Learning (Chapter 41): The ROC Curve

Machine Learning (Chapter 39): Bootstrapping & Cross-Validation

Machine Learning (Chapter 40): Class Evaluation Measures