Machine Learning (Chapter 5): Statistical Decision Theory - Regression

 


Machine Learning (Chapter 5): Statistical Decision Theory - Regression

Introduction

Statistical Decision Theory is a framework used to make decisions under uncertainty. In the context of machine learning, it helps in choosing the best model or hypothesis given the data and associated risks or costs. Regression, a fundamental concept in machine learning, deals with predicting a continuous output based on input features. In this article, we explore regression within the framework of Statistical Decision Theory, focusing on the mathematical foundations and practical implementation in Python.

1. Basics of Statistical Decision Theory

Statistical Decision Theory involves selecting a decision function δ(X)\delta(X) that minimizes a loss function L(Y,δ(X))L(Y, \delta(X)), where YY is the true output, and XX is the input feature vector. The objective is to minimize the expected loss, also known as risk, given by:

R(δ)=E[L(Y,δ(X))]R(\delta) = \mathbb{E}[L(Y, \delta(X))]

In regression, the most common loss function is the squared loss:

L(Y,δ(X))=(Yδ(X))2L(Y, \delta(X)) = (Y - \delta(X))^2

2. Regression and Statistical Decision Theory

In regression, our goal is to predict a continuous variable YY based on the input variables XX. According to Statistical Decision Theory, the optimal decision function δ(X)\delta(X) that minimizes the expected loss (risk) is the conditional expectation of YY given XX:

δ(X)=E[YX]\delta(X) = \mathbb{E}[Y|X]

For example, in a simple linear regression model, YY is predicted as a linear combination of the input features:

δ(X)=β0+β1X1+β2X2++βpXp\delta(X) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \dots + \beta_p X_p

where β0,β1,,βp\beta_0, \beta_1, \dots, \beta_p are the coefficients of the model.

3. Mathematical Derivation

Given a set of data points {(Xi,Yi)}i=1n\{(X_i, Y_i)\}_{i=1}^{n}, we aim to estimate the coefficients β\beta by minimizing the empirical risk, which in the case of squared loss is the sum of squared errors:

β^=argminβi=1n(Yiβ0β1Xi1βpXip)2\hat{\beta} = \underset{\beta}{\text{argmin}} \sum_{i=1}^{n} (Y_i - \beta_0 - \beta_1 X_{i1} - \dots - \beta_p X_{ip})^2

This can be solved using the normal equation:

β^=(XTX)1XTY\hat{\beta} = (X^TX)^{-1}X^TY

where XX is the matrix of input features and YY is the vector of outputs.

4. Python Implementation

Let's implement a simple linear regression model in Python using both a manual approach and with the help of libraries like NumPy.

python:
import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression # Generating some synthetic data np.random.seed(42) X = 2 * np.random.rand(100, 1) y = 4 + 3 * X + np.random.randn(100, 1) # Adding bias term (X_0 = 1) X_b = np.c_[np.ones((100, 1)), X] # Manual computation of coefficients using the normal equation theta_best = np.linalg.inv(X_b.T.dot(X_b)).dot(X_b.T).dot(y) print("Calculated coefficients:", theta_best) # Making predictions X_new = np.array([[0], [2]]) X_new_b = np.c_[np.ones((2, 1)), X_new] y_predict = X_new_b.dot(theta_best) # Plotting the results plt.plot(X_new, y_predict, "r-", label="Predictions") plt.plot(X, y, "b.") plt.xlabel("$x_1$", fontsize=18) plt.ylabel("$y$", rotation=0, fontsize=18) plt.legend() plt.show() # Using Scikit-Learn lin_reg = LinearRegression() lin_reg.fit(X, y) print("Scikit-Learn coefficients:", lin_reg.intercept_, lin_reg.coef_) # Scikit-Learn prediction y_predict_sklearn = lin_reg.predict(X_new)

5. Conclusion

Statistical Decision Theory provides a solid foundation for understanding and implementing regression models. By framing regression as a decision problem, we can systematically approach the task of predicting continuous outputs. The optimal decision rule in this context is to choose the model that minimizes the expected squared error, which, in many cases, leads to the use of linear regression models. The Python implementation illustrates how these concepts can be applied in practice, providing a bridge between theory and real-world applications.

By understanding the theoretical underpinnings and practical applications, one can make informed decisions when selecting and implementing regression models in machine learning tasks.

Comments

Popular posts from this blog

Machine Learning (Chapter 35): Decision Trees - Multiway Splits

Machine Learning (Chapter 6): Statistical Decision Theory - Classification

Machine Learning (Chapter 32): Stopping Criteria & Pruning