5 Week 4 — Markov Chain Monte Carlo (MCMC) Methods

5.1 Overview

This week introduces Markov Chain Monte Carlo (MCMC) — a powerful class of algorithms for simulating from complex posterior distributions that are difficult to sample from directly.
You will learn the logic of constructing Markov chains with a desired stationary distribution, how to implement the Metropolis–Hastings (MH) and Gibbs samplers, and how to assess convergence and mixing of MCMC chains.

5.2 Learning Goals

By the end of Week 4, you should be able to:

Explain the intuition behind MCMC and why it works.
Implement simple Metropolis–Hastings and Gibbs algorithms in R.
Diagnose convergence using trace plots and summary statistics.
Compute posterior means, variances, and credible intervals from MCMC samples.
Understand practical issues such as burn-in, thinning, and autocorrelation.

5.3 Lecture 1: Introduction to MCMC

5.3.1 1.1 Motivation

For many posteriors, sampling directly is infeasible.
We instead build a Markov chain whose stationary distribution is the target posterior \(p(\theta \mid y)\).
After sufficient iterations, the draws from the chain behave like samples from the true posterior.

5.3.2 1.2 Markov Chain Basics

A Markov chain \(\{\theta^{(t)}\}\) has the Markov property: \[ p(\theta^{(t)} \mid \theta^{(t-1)}, \ldots, \theta^{(1)}) = p(\theta^{(t)} \mid \theta^{(t-1)}). \] If the chain is ergodic, the distribution of \(\theta^{(t)}\) converges to a stationary distribution \(\pi(\theta)\).

MCMC constructs such chains so that \(\pi(\theta) = p(\theta \mid y)\).

5.3.3 1.3 Core Idea

Repeatedly propose a new value \(\theta^\*\) and decide whether to accept or reject it
based on how likely it is under the posterior.
This ensures that samples eventually represent the posterior distribution.

5.4 Lecture 2: The Metropolis–Hastings Algorithm

5.4.1 2.1 Algorithm Steps

Initialize with \(\theta^{(0)}\).
For each iteration \(t=1,2,\ldots,T\):
1. Propose \(\theta^\* \sim q(\theta^\* \mid \theta^{(t-1)})\).
2. Compute the acceptance probability: \[ \alpha = \min\left(1,\; \frac{p(y \mid \theta^\*)\, p(\theta^\*)\, q(\theta^{(t-1)} \mid \theta^\*)} {p(y \mid \theta^{(t-1)})\, p(\theta^{(t-1)})\, q(\theta^\* \mid \theta^{(t-1)})} \right). \]
3. Accept \(\theta^\*\) with probability \(\alpha\); otherwise, keep \(\theta^{(t)} = \theta^{(t-1)}\).
After burn-in, the samples \(\{\theta^{(t)}\}\) approximate draws from \(p(\theta \mid y)\).

5.4.2 2.2 Special Case: Symmetric Proposal

If \(q(\theta^\* \mid \theta^{(t-1)}) = q(\theta^{(t-1)} \mid \theta^\*)\),
then \[ \alpha = \min\left(1,\; \frac{p(y \mid \theta^\*)\, p(\theta^\*)} {p(y \mid \theta^{(t-1)})\, p(\theta^{(t-1)})}\right). \] This is the Metropolis algorithm.

5.4.3 2.3 Example: Posterior for a Normal Mean (Unknown Mean, Known Variance)

Let \(y_i \sim N(\mu, 1)\) for \(i=1,\ldots,n\) and prior \(\mu \sim N(0,10^2)\).

set.seed(123)
# Data
y <- rnorm(50, mean = 3, sd = 1)
n <- length(y)
post_log <- function(mu) {
  sum(dnorm(y, mu, 1, log=TRUE)) + dnorm(mu, 0, 10, log=TRUE)
}

# Metropolis sampler
T <- 10000
mu <- numeric(T)
mu[1] <- 0
proposal_sd <- 0.5

for(t in 2:T) {
  mu_star <- rnorm(1, mu[t-1], proposal_sd)
  log_alpha <- post_log(mu_star) - post_log(mu[t-1])
  if(log(runif(1)) < log_alpha) mu[t] <- mu_star else mu[t] <- mu[t-1]
}

burnin <- 1000
post_samples <- mu[(burnin+1):T]

hist(post_samples, prob=TRUE, col="skyblue", main="Posterior Samples for μ")
abline(v = mean(post_samples), col="red", lwd=2)