7 Week 6 — Hierarchical Bayesian Models

This week introduces hierarchical (multilevel) Bayesian models, which allow parameters to vary across groups while sharing information through higher-level priors.

We study partial pooling, shrinkage, and their implementation for normal and regression models.

7.1 Learning Goals

By the end of this week, you should be able to:

Explain the motivation for hierarchical modeling.
Formulate hierarchical models with group-level parameters.
Interpret partial pooling and shrinkage.
Implement a two-level Bayesian model in R using simulation or brms.
Compare complete, no-pooling, and partial-pooling approaches.

7.2 Lecture 1 — Motivation and Structure of Hierarchical Models

7.2.1 1.1 Why Hierarchical Models?

Hierarchical models capture structured variability among related groups or units:

Repeated measures within individuals
Students within classrooms
Machines within factories

They balance within-group and between-group information by introducing group-specific parameters drawn from a common population distribution.

7.2.2 1.2 Model Structure

For group $j = 1,\ldots,J$ and observations $i = 1,\ldots,n_j$:

\[ y_{ij} \mid \theta_j, \sigma^2 \sim \mathcal{N}(\theta_j, \sigma^2), \qquad \theta_j \mid \mu, \tau^2 \sim \mathcal{N}(\mu, \tau^2). \]

Top-level priors: \[ \mu \sim \mathcal{N}(0,10^2), \quad \tau \sim \text{Half-Cauchy}(0,5). \]

$\mu$: overall population mean
$\tau$: between-group standard deviation (pooling strength)
$\sigma$: within-group standard deviation

7.2.3 1.3 Three Extremes of Pooling

Model Type	Description	Behavior
No pooling	Estimate each $\theta_j$ separately	Ignores commonality across groups
Complete pooling	Force all groups to share one parameter	Ignores group differences
Partial pooling	Combine information via hierarchical prior	Balances both; default Bayesian choice

7.2.4 1.4 Shrinkage Intuition

Posterior for each group mean $ _j$ shrinks toward the global mean $\mu$: \[ E[\theta_j \mid y] = w_j \bar{y}_j + (1-w_j)\mu, \] where \[ w_j = \frac{n_j/\sigma^2}{n_j/\sigma^2 + 1/\tau^2}. \]

Large $n_j$ (lots of data): $w_j \to 1$ → less shrinkage.
Small $n_j$: $w_j \to 0$ → stronger shrinkage toward $\mu$.

7.2.5 1.5 Example — Simulated Group Means

set.seed(6)
J <- 8; n_j <- rep(10, J)
mu_true <- 5; tau_true <- 2; sigma_true <- 1

theta_true <- rnorm(J, mu_true, tau_true)
y <- sapply(theta_true, function(tj) rnorm(10, tj, sigma_true))
ybar <- colMeans(y)

# No pooling (group means)
no_pool <- ybar

# Complete pooling (global mean)
complete_pool <- mean(y)

# Partial pooling: simple empirical Bayes shrinkage
tau_hat <- sd(ybar)
sigma_hat <- sd(as.vector(y))
w <- (n_j/sigma_hat^2) / (n_j/sigma_hat^2 + 1/tau_hat^2)
partial_pool <- w*ybar + (1-w)*complete_pool

data.frame(Group=1:J,
           ybar=round(ybar,2),
           NoPool=round(no_pool,2),
           Partial=round(partial_pool,2))

  Group ybar NoPool Partial
1     1 5.53   5.53    5.53
2     2 4.06   4.06    4.21
3     3 6.90   6.90    6.76
4     4 8.19   8.19    7.91
5     5 4.86   4.86    4.93
6     6 5.42   5.42    5.43
7     7 2.20   2.20    2.55
8     8 6.99   6.99    6.84

Observe how partial-pool estimates move small-sample groups toward the global mean.

7.2.6 1.6 Advantages of Hierarchical Models

Borrow strength across groups.
Naturally incorporate uncertainty at multiple levels.
Handle unbalanced data and missingness elegantly.
Allow group-level predictors and complex dependence structures.

7.3 Lecture 2 — Hierarchical Regression and Implementation

7.3.1 2.1 Hierarchical Linear Regression

General form: \[ y_{ij} = \alpha_j + \beta_j x_{ij} + \varepsilon_{ij}, \quad \varepsilon_{ij} \sim \mathcal{N}(0,\sigma^2), \] \[ \alpha_j \sim \mathcal{N}(\mu_\alpha, \tau_\alpha^2), \quad \beta_j \sim \mathcal{N}(\mu_\beta, \tau_\beta^2). \]

This allows both intercepts and slopes to vary by group.

7.3.2 2.2 Example — Hierarchical Regression with `brms`

library(brms)
set.seed(7)

J <- 10
n_j <- 20
group <- rep(1:J, each=n_j)
x <- rnorm(J*n_j, 0, 1)

alpha_true <- rnorm(J, 2, 1)
beta_true  <- rnorm(J, 3, 0.5)
sigma_true <- 0.8

y <- alpha_true[group] + beta_true[group]*x + rnorm(J*n_j, 0, sigma_true)
dat <- data.frame(y, x, group=factor(group))

# Hierarchical model (random intercept and slope)
m_hier <- brm(y ~ 1 + x + (1 + x | group),
              data=dat, family=gaussian(),
              chains=2, iter=2000, refresh=0)

summary(m_hier)
plot(m_hier)

The (1 + x | group) formula defines a varying intercept and slope for each group.

7.3.3 2.3 Interpretation

Posterior summaries provide:

Group-level means $\alpha_j, \beta_j$.
Population-level means $\mu_\alpha, \mu_\beta$.
Variability estimates $\tau_{\alpha}, \tau_{\beta}$ showing degree of pooling.

Visualize partial pooling by comparing group-specific fits to the global regression line.

pp_check(m_hier)
plot(conditional_effects(m_hier), points=TRUE)

7.3.4 2.4 Practical Considerations

Choose weakly informative hyperpriors for scale parameters (e.g., Half-Cauchy or Exponential).
Inspect group-level posterior intervals to assess pooling.
Center predictors for numerical stability.
Use hierarchical models as the default when groups share a common process.

7.3.5 2.5 Summary of Hierarchical Modeling Benefits

Feature	Description
Partial pooling	Shares strength across groups while retaining group differences.
Shrinkage	Stabilizes small-sample estimates toward population mean.
Interpretability	Captures multi-level variation naturally.
Predictive accuracy	Usually superior to separate or fully pooled models.

7.4 Homework 6

Conceptual
- Explain why hierarchical modeling is often superior to analyzing groups separately.
- Distinguish between complete pooling, no pooling, and partial pooling.
Computational
- Simulate a small dataset with several groups and fit:
  1. Separate regressions (no pooling).
  2. A single pooled regression.
  3. A hierarchical model (partial pooling).
- Compare estimates for each group and interpret shrinkage behavior.
Reflection
- In what situations would you not use a hierarchical model?
- How does the hierarchical prior act as a regularizer?

7.5 Key Takeaways

Concept	Summary
Hierarchical Model	Combines group-level and population-level inference.
Partial Pooling	Balances within- and between-group information.
Shrinkage	Moves noisy group estimates toward a global mean.
Hierarchical Regression	Extends pooling to both intercepts and slopes.
Practical Insight	Default choice when analyzing grouped or multilevel data.

Next Week: Bayesian Decision Theory — introducing utilities, losses, and optimal decision rules under uncertainty.