Machine Learning (Chapter 27): Maximum Likelihood Estimate (MLE)

By Ritesh Sahu August 30, 2024

Machine Learning (Chapter 27): Maximum Likelihood Estimate (MLE)

Introduction to Maximum Likelihood Estimate (MLE)

Maximum Likelihood Estimation (MLE) is a fundamental statistical method used in machine learning and statistics to estimate the parameters of a probability distribution or statistical model. The core idea behind MLE is to find the parameter values that maximize the likelihood function, which measures how likely it is to observe the given data under different parameter values.

Likelihood Function

Given a dataset $\mathcal{D} = \{x_1, x_2, \dots, x_n\}$ and a probabilistic model with a parameter $\theta$ , the likelihood function $\mathcal{L}(\theta)$ is defined as the probability of observing the data given the parameter $\theta$ . Mathematically, it is expressed as:

$\mathcal{L}(\theta) = P(\mathcal{D}|\theta) = \prod_{i=1}^{n} P(x_i|\theta)$

For convenience, the log-likelihood function is often used, which is the natural logarithm of the likelihood function:

$\log \mathcal{L}(\theta) = \sum_{i=1}^{n} \log P(x_i|\theta)$

Maximum Likelihood Estimation

The goal of MLE is to find the value of $\theta$ that maximizes the likelihood function. Formally, this can be expressed as:

$\hat{\theta} = \underset{\theta}{\text{argmax}} \, \mathcal{L}(\theta) = \underset{\theta}{\text{argmax}} \, \log \mathcal{L}(\theta)$

Example: MLE for Gaussian Distribution

Let's consider a simple example where we estimate the mean and variance of a Gaussian distribution using MLE. Given a dataset $\mathcal{D} = \{x_1, x_2, \dots, x_n\}$ that is assumed to be drawn from a Gaussian distribution with mean $\mu$ and variance $\sigma^2$ , the probability density function is:

$P(x|\mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)$

The likelihood function for the dataset is:

$\mathcal{L}(\mu, \sigma^2) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)$

Taking the logarithm of the likelihood function, we get the log-likelihood:

$\log \mathcal{L}(\mu, \sigma^2) = -\frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2$

To find the MLE estimates $\hat{\mu}$ and $\hat{\sigma}^2$ , we take the partial derivatives of the log-likelihood function with respect to $\mu$ and $\sigma^2$ , and set them to zero:

$\frac{\partial \log \mathcal{L}(\mu, \sigma^2)}{\partial \mu} = \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu) = 0$ $\frac{\partial \log \mathcal{L}(\mu, \sigma^2)}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (x_i - \mu)^2 = 0$

Solving these equations, we obtain:

$\hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i$ $\hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{\mu})^2$

Python Implementation

Below is a Python implementation of MLE for estimating the parameters of a Gaussian distribution.

python:
import numpy as np

# Sample data
data = np.array([2.3, 2.5, 3.1, 4.0, 4.2, 5.5, 5.7, 6.0])

# MLE estimation of mean
mu_hat = np.mean(data)

# MLE estimation of variance
sigma2_hat = np.var(data, ddof=0)  # ddof=0 for MLE (population variance)

print(f"Estimated mean (MLE): {mu_hat}")
print(f"Estimated variance (MLE): {sigma2_hat}")

Output:

java:
Estimated mean (MLE): 4.1625
Estimated variance (MLE): 1.70859375

Conclusion

Maximum Likelihood Estimation is a powerful and widely-used method for parameter estimation in statistical models. By maximizing the likelihood function, MLE provides estimates of model parameters that best explain the observed data. The example provided demonstrates how MLE can be applied to estimate the mean and variance of a Gaussian distribution, and the corresponding Python implementation illustrates how these concepts can be applied in practice.

This chapter has covered the mathematical foundations and practical application of MLE, setting the stage for more complex models and estimation techniques in the following chapters.

Search This Blog

Machine learning and artificial intelligence