Machine Learning (Chapter 14): Linear Classification
Chapter 14: Linear Classification in Machine Learning
Linear classification is a fundamental technique in machine learning where the goal is to classify data points into distinct classes using a linear decision boundary. This chapter delves into the core concepts, mathematical foundations, and practical implementation of linear classification.
Mathematical Foundations
Linear classification algorithms aim to find a linear separator that best divides the data into classes. The most common linear classifier is the Linear Discriminant Function.
Linear Discriminant Function
The linear discriminant function can be represented as:
Here:
- is the feature vector.
- is the weight vector.
- is the bias term.
The decision boundary is defined by the equation:
Points on one side of this boundary are classified into one class, while points on the other side are classified into another.
Objective Function
In a binary classification problem, the goal is to minimize the loss function. For linear classifiers, the Logistic Loss (also known as Cross-Entropy Loss) is commonly used:
Where:
- is the number of samples.
- is the true label of the -th sample.
- is the predicted probability of the positive class for the -th sample.
The predicted probability is obtained using the sigmoid function:
Implementation in Python
Let’s implement a simple linear classifier using Python. We will use the Scikit-learn library, which provides a straightforward interface for building and evaluating machine learning models.
Here’s a step-by-step example:
Import Libraries
python:import numpy as np from sklearn.datasets import make_classification from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, confusion_matrixGenerate Sample Data
python:# Generate a synthetic dataset X, y = make_classification(n_samples=100, n_features=2, n_informative=2, n_clusters_per_class=1, random_state=42) # Split the data into training and testing sets X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)Train the Linear Classifier
python:# Initialize and train the logistic regression model model = LogisticRegression() model.fit(X_train, y_train)Make Predictions and Evaluate the Model
python:# Make predictions on the test set y_pred = model.predict(X_test) # Evaluate the model accuracy = accuracy_score(y_test, y_pred) conf_matrix = confusion_matrix(y_test, y_pred) print(f'Accuracy: {accuracy}') print(f'Confusion Matrix:\n{conf_matrix}')Visualize the Decision Boundary
python:import matplotlib.pyplot as plt # Plotting the decision boundary h = .02 # step size in the mesh x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1 y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1 xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h)) Z = model.predict(np.c_[xx.ravel(), yy.ravel()]) Z = Z.reshape(xx.shape) plt.contourf(xx, yy, Z, alpha=0.3) plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', marker='o') plt.xlabel('Feature 1') plt.ylabel('Feature 2') plt.title('Decision Boundary of Linear Classifier') plt.show()
Summary
Linear classification is a powerful technique for separating data into different classes using a linear decision boundary. By understanding the mathematical foundations and implementing the algorithm using Python, you can effectively apply linear classifiers to various machine learning problems.

Comments
Post a Comment