This week introduces hierarchical (multilevel) Bayesian models, which allow parameters to vary across groups while sharing information through higher-level priors.
We study partial pooling, shrinkage, and their implementation for normal and regression models.
7.1 Learning Goals
By the end of this week, you should be able to:
Explain the motivation for hierarchical modeling.
Formulate hierarchical models with group-level parameters.
Interpret partial pooling and shrinkage.
Implement a two-level Bayesian model in R using simulation or brms.
Compare complete, no-pooling, and partial-pooling approaches.
7.2 Lecture 1 — Motivation and Structure of Hierarchical Models
7.2.1 1.1 Why Hierarchical Models?
Hierarchical models capture structured variability among related groups or units:
Repeated measures within individuals
Students within classrooms
Machines within factories
They balance within-group and between-group information by introducing group-specific parameters drawn from a common population distribution.
7.2.2 1.2 Model Structure
For group \(j = 1,\ldots,J\) and observations \(i = 1,\ldots,n_j\):
\(\tau\): between-group standard deviation (pooling strength)
\(\sigma\): within-group standard deviation
7.2.3 1.3 Three Extremes of Pooling
Model Type
Description
Behavior
No pooling
Estimate each \(\theta_j\) separately
Ignores commonality across groups
Complete pooling
Force all groups to share one parameter
Ignores group differences
Partial pooling
Combine information via hierarchical prior
Balances both; default Bayesian choice
7.2.4 1.4 Shrinkage Intuition
Posterior for each group mean $ _j$ shrinks toward the global mean \(\mu\): \[
E[\theta_j \mid y] = w_j \bar{y}_j + (1-w_j)\mu,
\] where \[
w_j = \frac{n_j/\sigma^2}{n_j/\sigma^2 + 1/\tau^2}.
\]
Large \(n_j\) (lots of data): \(w_j \to 1\) → less shrinkage.
Small \(n_j\): \(w_j \to 0\) → stronger shrinkage toward \(\mu\).