19 Nested Models

Learning Objectives

By the end of this lecture, you should be able to:

Distinguish between nested models (full vs reduced models) and nested designs (hierarchical data structures).

Perform and interpret a partial F-test and a likelihood ratio test (LRT) for model comparison.

Explain the difference between crossed and nested factors.

Implement nested model testing in SAS using PROC REG and the TEST statement.

Specify hierarchical nested effects in PROC GLM and PROC MIXED using B(A) syntax.

19.1 Introduction

In the previous chapter, we studied model selection methods such as AIC, BIC, and cross-validation for comparing competing models. In this chapter, we focus on a more structured type of comparison: nested structures.

The word nested appears in two related but distinct contexts in statistics:

Nested models: a simpler model is obtained from a more complex model by imposing parameter restrictions.
Nested designs: one factor exists only within the levels of another factor, creating a hierarchical data structure.

This chapter connects regression, ANOVA, and mixed models by treating both ideas carefully.

A model \(M_0\) is said to be nested within a model \(M_1\) if \(M_0\) can be obtained from \(M_1\) by imposing constraints on the parameters, usually by setting some coefficients equal to zero.

19.1.1 Full and Reduced Models

Consider the following regression models.

Full model \(M_1\): \[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \varepsilon \]

Reduced model \(M_0\): \[ Y = \beta_0 + \beta_1 X_1 + \varepsilon \]

The reduced model is nested within the full model because it is obtained by imposing the restrictions \[ \beta_2 = \beta_3 = 0. \]

In other words, the reduced model is a special case of the full model.

Nested models allow us to answer a very specific question:

Do the additional terms in the full model provide enough improvement to justify their inclusion?

This leads naturally to formal hypothesis testing.

19.2 Partial F-test for Nested Linear Models

For linear regression models fit by least squares, the standard test for comparing nested models is the partial F-test.

We test \[ H_0: \beta_2 = \beta_3 = 0 \] against \[ H_a: \text{at least one of } \beta_2, \beta_3 \text{ is nonzero.} \]

Let:

\(SSE_R\): error sum of squares for the reduced model
\(SSE_F\): error sum of squares for the full model
\(df_R\): error degrees of freedom for the reduced model
\(df_F\): error degrees of freedom for the full model

Then the test statistic is \[ F = \frac{(SSE_R - SSE_F)/(df_R - df_F)}{SSE_F/df_F}. \]

Interpretation

If the full model does not improve fit much, then \(SSE_R - SSE_F\) will be small, and the \(F\)-statistic will be small.
If the full model improves fit substantially, then the \(F\)-statistic will be large.

A small \(p\)-value provides evidence against \(H_0\), suggesting that the additional predictors should be retained.

19.3 Likelihood Ratio Test (LRT)

For likelihood-based models, such as generalized linear models and many mixed models, we often compare nested models using the likelihood ratio test.

Let \(\ell_{\text{Full}}\) and \(\ell_{\text{Reduced}}\) be the maximized log-likelihoods under the full and reduced models, respectively. The likelihood ratio test statistic is \[ LRT = 2\left(\ell_{\text{Full}} - \ell_{\text{Reduced}}\right). \]

Under regularity conditions and under \(H_0\), \[ LRT \sim \chi^2_{\,df_R - df_F}, \] where \(df_R - df_F\) is the difference in the number of free parameters between the two models.

When to use which?

Partial F-test: typically used for nested linear models fit by least squares.
LRT: commonly used for generalized linear models, mixed models, and other likelihood-based settings.

19.4 SAS Implementation

In SAS, we can test nested regression models directly using PROC REG and the TEST statement.

PROC REG DATA=my_data;
    MODEL Y = X1 X2 X3;

    /* Test whether the reduced model is sufficient */
    TEST X2 = 0, X3 = 0;
RUN;
QUIT;

19.4.1 What Does This Do?

This code fits the full model \[ Y = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \beta_3 X_3 + \varepsilon \] and tests whether the restrictions \[ \beta_2 = \beta_3 = 0 \] are reasonable.

If the test is significant, then the reduced model is too simple and the additional terms should remain in the model.

19.5 Concept 2: Nested Designs

In ANOVA and mixed models, the word nested refers to a hierarchical factor structure.

A factor \(B\) is nested within a factor \(A\) if each level of \(B\) appears in only one level of \(A\).

Suppose:

\(A\) = Region
\(B\) = City

If each city belongs to exactly one region, then City is nested within Region.

We write this as: \[ \text{City(Region)}. \]

19.6 Crossed vs Nested Factors

It is important to distinguish crossed factors from nested factors.

Crossed Factors

Two factors are crossed if every level of one factor appears with every level of the other factor.

Treatment and gender
Every treatment level can be observed for every gender

In this case, an interaction term such as \(A \times B\) is meaningful.

Nested Factors

A factor is nested when its levels exist only within the levels of another factor.

Example:

Students within classrooms
Classrooms within schools
Cities within regions

You cannot meaningfully combine a nested level with all levels of the parent factor. For example, New York City is not a city within the Midwest region.

Summary Table

Structure	Description
Crossed	Every level of A appears with every level of B
Nested	Levels of B exist only within levels of A

19.7 Statistical Formulation for a Nested Design

A standard nested ANOVA model is \[ y_{ijk} = \mu + \alpha_i + \beta_{j(i)} + \varepsilon_{ijk}, \] where

\(\mu\) is the overall mean,
\(\alpha_i\) is the effect of factor \(A\) (for example, Region),
\(\beta_{j(i)}\) is the effect of factor \(B\) nested within \(A\) (for example, City within Region),
\(\varepsilon_{ijk}\) is the random error term.

This is a hierarchical structure rather than a factorial one.

19.8 SAS Implementation for Nested Designs

19.8.1 Example Data

DATA nested_data;
    INPUT Region $ City $ Y;
    DATALINES;
NE NY 30
NE NY 35
NE Pitt 18
NE Pitt 20
MW Chicago 10
MW Chicago 9
;
RUN;

19.8.2 `PROC GLM`: Fixed-Effects Perspective

PROC GLM DATA=nested_data;
    CLASS Region City;
    MODEL Y = Region City(Region);
RUN;
QUIT;

This specification treats the nested structure explicitly using City(Region).

19.8.3 `PROC MIXED`: Hierarchical or Random-Effects Perspective

PROC MIXED DATA=nested_data;
    CLASS Region City;
    MODEL Y = Region;
    RANDOM City(Region);
RUN;
QUIT;

This version is often more appropriate when the nested factor is viewed as a random effect.

19.9 Fixed vs Random Effects

A key modelling decision is whether factors are treated as fixed or random.

Component	Fixed Effect Interpretation	Random Effect Interpretation
Region	Compare these specific regions	Regions are sampled from a larger population
City(Region)	Compare these specific cities within each region	City-to-city variation within region

Important Note

PROC GLM and PROC MIXED are not simply interchangeable. The interpretation changes depending on whether the factor is treated as fixed or random.

Use PROC GLM when you want to compare specific factor levels directly.
Use PROC MIXED when you want to model hierarchical variation and random effects.

Interpreting the Output

In a nested design, the total variation is decomposed into:

variation between regions,
variation among cities within regions,
residual variation within cities.

This is called hierarchical variance decomposition.

So the interpretation is different from a crossed two-way ANOVA. We are not asking whether Region and City interact; instead, we are asking how variability is distributed across levels of a hierarchy.

19.10 Connection to Model Selection

This chapter connects naturally to the previous lecture on model selection.

General Model Selection Asks

Which model fits the data best overall?

Nested Model Comparison Asks

Does the added complexity of a larger model significantly improve fit over a simpler baseline?

This is why nested models are so important:

they provide the formal basis for partial F-tests,
they motivate likelihood ratio tests,
and they help us think carefully about when extra structure is statistically justified.

At the same time, nested designs remind us that data structure also matters. A model can be more complex not only because it has more parameters, but also because it reflects a deeper hierarchical organization of the data.

19.11 Summary

Nested structures connect three core ideas in statistical modelling:

Model building: specifying appropriate structure for the data,
estimation: fitting full and reduced models or decomposing variance,
inference: deciding whether added complexity is justified.

19.11.1 Key Takeaways

Nested models concern parameter restrictions.
Nested designs concern hierarchical factor structures.
The partial F-test is used for nested linear models.
The likelihood ratio test is used for many likelihood-based nested comparisons.
In SAS, nested structures are expressed using TEST, B(A), and RANDOM.

19.12 Practice Problems

Show mathematically how a reduced regression model is nested within a full model through parameter constraints.
Suppose students are nested within classrooms, and classrooms are nested within schools. Write an appropriate SAS model specification.
Fit a full regression model using PROC REG, carry out a TEST statement, and verify the partial F-statistic manually using the reduced and full model \(SSE\) values.
Explain the difference between PROC GLM and PROC MIXED for a nested design.
Give one example of crossed factors and one example of nested factors, and explain why they differ.