Machine Learning (Chapter 27): Maximum Likelihood Estimate (MLE)


 

Machine Learning (Chapter 27): Maximum Likelihood Estimate (MLE)

Introduction to Maximum Likelihood Estimate (MLE)

Maximum Likelihood Estimation (MLE) is a fundamental statistical method used in machine learning and statistics to estimate the parameters of a probability distribution or statistical model. The core idea behind MLE is to find the parameter values that maximize the likelihood function, which measures how likely it is to observe the given data under different parameter values.

Likelihood Function

Given a dataset D={x1,x2,,xn}\mathcal{D} = \{x_1, x_2, \dots, x_n\} and a probabilistic model with a parameter θ\theta, the likelihood function L(θ)\mathcal{L}(\theta) is defined as the probability of observing the data given the parameter θ\theta. Mathematically, it is expressed as:

L(θ)=P(Dθ)=i=1nP(xiθ)\mathcal{L}(\theta) = P(\mathcal{D}|\theta) = \prod_{i=1}^{n} P(x_i|\theta)

For convenience, the log-likelihood function is often used, which is the natural logarithm of the likelihood function:

logL(θ)=i=1nlogP(xiθ)\log \mathcal{L}(\theta) = \sum_{i=1}^{n} \log P(x_i|\theta)

Maximum Likelihood Estimation

The goal of MLE is to find the value of θ\theta that maximizes the likelihood function. Formally, this can be expressed as:

θ^=argmaxθL(θ)=argmaxθlogL(θ)\hat{\theta} = \underset{\theta}{\text{argmax}} \, \mathcal{L}(\theta) = \underset{\theta}{\text{argmax}} \, \log \mathcal{L}(\theta)

Example: MLE for Gaussian Distribution

Let's consider a simple example where we estimate the mean and variance of a Gaussian distribution using MLE. Given a dataset D={x1,x2,,xn}\mathcal{D} = \{x_1, x_2, \dots, x_n\} that is assumed to be drawn from a Gaussian distribution with mean μ\mu and variance σ2\sigma^2, the probability density function is:

P(xμ,σ2)=12πσ2exp((xμ)22σ2)P(x|\mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)

The likelihood function for the dataset is:

L(μ,σ2)=i=1n12πσ2exp((xiμ)22σ2)\mathcal{L}(\mu, \sigma^2) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)

Taking the logarithm of the likelihood function, we get the log-likelihood:

logL(μ,σ2)=n2log(2πσ2)12σ2i=1n(xiμ)2\log \mathcal{L}(\mu, \sigma^2) = -\frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2

To find the MLE estimates μ^\hat{\mu} and σ^2\hat{\sigma}^2, we take the partial derivatives of the log-likelihood function with respect to μ\mu and σ2\sigma^2, and set them to zero:

logL(μ,σ2)μ=1σ2i=1n(xiμ)=0\frac{\partial \log \mathcal{L}(\mu, \sigma^2)}{\partial \mu} = \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu) = 0 logL(μ,σ2)σ2=n2σ2+12σ4i=1n(xiμ)2=0\frac{\partial \log \mathcal{L}(\mu, \sigma^2)}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (x_i - \mu)^2 = 0

Solving these equations, we obtain:

μ^=1ni=1nxi\hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i σ^2=1ni=1n(xiμ^)2\hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{\mu})^2

Python Implementation

Below is a Python implementation of MLE for estimating the parameters of a Gaussian distribution.

python:

import numpy as np # Sample data data = np.array([2.3, 2.5, 3.1, 4.0, 4.2, 5.5, 5.7, 6.0]) # MLE estimation of mean mu_hat = np.mean(data) # MLE estimation of variance sigma2_hat = np.var(data, ddof=0) # ddof=0 for MLE (population variance) print(f"Estimated mean (MLE): {mu_hat}") print(f"Estimated variance (MLE): {sigma2_hat}")

Output:

java:

Estimated mean (MLE): 4.1625 Estimated variance (MLE): 1.70859375

Conclusion

Maximum Likelihood Estimation is a powerful and widely-used method for parameter estimation in statistical models. By maximizing the likelihood function, MLE provides estimates of model parameters that best explain the observed data. The example provided demonstrates how MLE can be applied to estimate the mean and variance of a Gaussian distribution, and the corresponding Python implementation illustrates how these concepts can be applied in practice.

This chapter has covered the mathematical foundations and practical application of MLE, setting the stage for more complex models and estimation techniques in the following chapters.

Comments

Popular posts from this blog

Machine Learning (Chapter 41): The ROC Curve

Machine Learning (Chapter 39): Bootstrapping & Cross-Validation

Machine Learning (Chapter 40): Class Evaluation Measures