Machine Learning (Chapter 29): Parameter Estimation III

Machine Learning (Chapter 29): Parameter Estimation III - Bayesian Estimation

By Ritesh Sahu August 30, 2024

Machine Learning (Chapter 29): Parameter Estimation III - Bayesian Estimation

Bayesian Estimation is a powerful statistical technique that extends the principles of Bayesian inference to parameter estimation. Unlike Maximum Likelihood Estimation (MLE) which focuses solely on finding the parameter values that maximize the likelihood function, Bayesian Estimation incorporates prior knowledge or beliefs about the parameters and updates these beliefs with data. This chapter explores the mathematical foundation of Bayesian Estimation, provides examples, and includes Python code to demonstrate the concepts.

1. Introduction to Bayesian Estimation

Bayesian Estimation is based on Bayes' Theorem, which is used to update the probability of a hypothesis as more evidence becomes available. In the context of parameter estimation, the hypothesis represents the parameters of a model, and the evidence is the observed data.

Bayes' Theorem is mathematically expressed as:

$P(\theta | \mathbf{X}) = \frac{P(\mathbf{X} | \theta) \cdot P(\theta)}{P(\mathbf{X})}$

Where:

$P(\theta | \mathbf{X})$ is the posterior probability of the parameter $\theta$ given the data $\mathbf{X}$ .
$P(\mathbf{X} | \theta)$ is the likelihood of the data $\mathbf{X}$ given the parameter $\theta$ .
$P(\theta)$ is the prior probability of the parameter $\theta$ .
$P(\mathbf{X})$ is the marginal likelihood or evidence, which is a normalizing constant.

The goal of Bayesian Estimation is to derive the posterior distribution $P(\theta | \mathbf{X})$ , which combines the prior distribution with the likelihood.

2. Prior Distributions

The prior distribution $P(\theta)$ represents our belief about the parameters before observing any data. Choosing a prior is crucial, as it can significantly influence the posterior distribution. Priors can be:

Non-informative (Uniform): Assumes no prior knowledge about the parameter.
Informative: Incorporates specific prior knowledge.
Conjugate: Chosen to simplify the posterior calculation.

For example, if we are estimating the parameter $\mu$ of a normal distribution, a common choice for the prior might be another normal distribution.

$\mu \sim \mathcal{N}(\mu_0, \sigma_0^2)$

3. Posterior Distribution

The posterior distribution combines the prior with the likelihood of the observed data to provide a new distribution reflecting our updated beliefs. For many models, the posterior distribution may not have a closed-form solution, requiring numerical methods like Markov Chain Monte Carlo (MCMC) for estimation.

Let's consider a simple example of Bayesian Estimation with a normal likelihood and a normal prior.

Example:

Given data $\mathbf{X} = \{x_1, x_2, \dots, x_n\}$ sampled from a normal distribution $\mathcal{N}(\mu, \sigma^2)$ , where $\sigma^2$ is known, the likelihood function is:

$P(\mathbf{X} | \mu) = \prod_{i=1}^n \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x_i - \mu)^2}{2\sigma^2}\right)$

Assume a normal prior for $\mu$ :

$\mu \sim \mathcal{N}(\mu_0, \sigma_0^2)$

The posterior distribution of $\mu$ given the data $\mathbf{X}$ is:

$P(\mu | \mathbf{X}) \propto P(\mathbf{X} | \mu) \cdot P(\mu)$

By multiplying the likelihood and the prior, the posterior distribution can be shown to be:

$\mu | \mathbf{X} \sim \mathcal{N}\left(\frac{\frac{\mu_0}{\sigma_0^2} + \frac{\sum_{i=1}^n x_i}{\sigma^2}}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}}, \frac{1}{\frac{1}{\sigma_0^2} + \frac{n}{\sigma^2}}\right)$

4. Python Implementation

Let's implement Bayesian Estimation in Python using the above example.

python:
import numpy as np
import matplotlib.pyplot as plt

# Given data (X) and known variance (sigma^2)
X = np.array([2.5, 3.0, 2.8, 3.2, 2.9])
sigma2 = 0.25

# Prior parameters
mu0 = 3.0
sigma0_2 = 0.1

# Compute the posterior mean and variance
n = len(X)
mu_numerator = mu0 / sigma0_2 + np.sum(X) / sigma2
mu_denominator = 1 / sigma0_2 + n / sigma2

mu_post = mu_numerator / mu_denominator
sigma_post_2 = 1 / mu_denominator

# Print posterior mean and variance
print(f'Posterior Mean: {mu_post}')
print(f'Posterior Variance: {sigma_post_2}')

# Plot the prior, likelihood, and posterior
mu_values = np.linspace(2.6, 3.4, 100)

# Prior distribution
prior = (1 / np.sqrt(2 * np.pi * sigma0_2)) * np.exp(-0.5 * (mu_values - mu0)**2 / sigma0_2)

# Likelihood (assuming likelihood dominates)
likelihood = (1 / np.sqrt(2 * np.pi * sigma2 / n)) * np.exp(-0.5 * n * (mu_values - np.mean(X))**2 / sigma2)

# Posterior distribution
posterior = (1 / np.sqrt(2 * np.pi * sigma_post_2)) * np.exp(-0.5 * (mu_values - mu_post)**2 / sigma_post_2)

plt.plot(mu_values, prior, label='Prior')
plt.plot(mu_values, likelihood, label='Likelihood')
plt.plot(mu_values, posterior, label='Posterior')
plt.xlabel('Mu')
plt.ylabel('Density')
plt.legend()
plt.show()

In this code:

We compute the posterior mean and variance based on the prior and observed data.
The prior, likelihood, and posterior distributions are plotted to visualize how Bayesian Estimation updates the belief about the parameter $\mu$ .

5. Conclusion

Bayesian Estimation provides a coherent and flexible approach to parameter estimation, allowing the incorporation of prior knowledge. It’s particularly useful in scenarios where data is sparse or when integrating domain knowledge is crucial. The posterior distribution resulting from Bayesian Estimation not only provides point estimates but also quantifies uncertainty, making it a powerful tool for decision-making in uncertain environments.

This chapter has introduced the mathematical foundation of Bayesian Estimation, illustrated it with a simple example, and provided a Python implementation to demonstrate its practical application.

Search This Blog

Machine learning and artificial intelligence

Machine Learning (Chapter 29): Parameter Estimation III - Bayesian Estimation

Machine Learning (Chapter 29): Parameter Estimation III - Bayesian Estimation

1. Introduction to Bayesian Estimation

2. Prior Distributions

3. Posterior Distribution

Example:

4. Python Implementation

5. Conclusion

Comments

Post a Comment

Popular posts from this blog

Machine Learning (Chapter 35): Decision Trees - Multiway Splits

Machine Learning (Chapter 32): Stopping Criteria & Pruning

Machine Learning (Chapter 41): The ROC Curve