17 Linear Mixed Effect Models II

Learning Objectives

By the end of this lecture, you should be able to:

Implement linear mixed models in SAS using PROC MIXED

Understand the roles of the CLASS, MODEL, and RANDOM statements

Interpret key output tables, including:

covariance parameter estimates

tests of fixed effects

fit statistics

Compare fixed-effects models and mixed-effects models

Explain why modeling correlation can change standard errors, p-values, and conclusions

17.1 Introduction

In the previous lecture, we introduced the conceptual framework of linear mixed models:

\[ Y = X\beta + Z\gamma + \epsilon, \]

where:

\(X\beta\) represents the fixed-effects part of the model
\(Z\gamma\) represents the random-effects part of the model
\(\epsilon\) is the residual error

In this lecture, we focus on how to fit and interpret these models in SAS using PROC MIXED.

Main Goal of This Lecture

Last time, the main question was:

Why do we need mixed models?

This time, the main question is:

How do we fit, read, and interpret mixed models in SAS?

17.2 Why `PROC MIXED`?

Recall the roles of some common SAS procedures:

PROC REG fits ordinary regression models and assumes independent observations
PROC GLM handles fixed-effects models, including ANOVA-style models with categorical predictors
PROC MIXED extends these ideas to data with dependence, clustering, or repeated measurements

Typical use cases include:

clustered data, such as students within schools or patients within hospitals
repeated-measures data, such as multiple observations on the same subject

A General Template for PROC MIXED

A basic mixed model in SAS has the form

PROC MIXED DATA=dataset;
    CLASS categorical_variables;
    MODEL response = fixed_effects;
    RANDOM random_effects / SUBJECT=cluster;
RUN;

How to read this syntax?

CLASS tells SAS which variables are categorical
MODEL specifies the fixed-effects part
RANDOM specifies the random-effects part
SUBJECT= identifies the clustering variable

A mixed model has two parts:

A fixed-effects part, which describes the average relationship
A random-effects part, which accounts for dependence among observations from the same subject or cluster

17.3 Example Dataset: Family Heights

To keep the ideas concrete, consider the following family-height data.

DATA heights;
    INPUT Family Gender $ Height @@;
    DATALINES;
1 F 67 1 F 66 1 F 64 1 M 71 1 M 72
2 F 63 2 F 67 2 M 69 2 M 68 2 M 70
3 F 63 3 M 64
4 F 67 4 F 66 4 M 67 4 M 67 4 M 69
;
RUN;

In this dataset:

Height is the response variable
Gender is a categorical explanatory variable
Family identifies the cluster

So the natural grouping structure is:

observations within the same family may be correlated

This means a standard fixed-effects model that assumes independence may not be appropriate.

17.3.1 Step 1: A Fixed-Effects Model

We begin with a model that treats family as a fixed effect.

PROC MIXED DATA=heights;
    CLASS Family Gender;
    MODEL Height = Gender Family Family*Gender;
RUN;

What this model does

This model is similar to a fixed-effects ANOVA model:

Gender is treated as a fixed effect
Family is also treated as a fixed effect
Family*Gender allows the gender difference to vary across families

Interpretation

This approach is useful if:

the specific families in the dataset are the only ones of interest
you do not want to generalize beyond these observed families

However, this model does not use the idea that families may be viewed as a random sample from a larger population of families.

Important

If family is really a random grouping factor, treating it only as fixed may not be the best way to represent the data structure.

17.3.2 Step 2: A Mixed-Effects Model

Now we fit a mixed model that treats family-related effects as random.

PROC MIXED DATA=heights;
    CLASS Family Gender;
    MODEL Height = Gender;
    RANDOM Family Family*Gender;
RUN;

This model can be written conceptually as

\[ Y_{ij} = \beta_0 + \beta_1 \, \text{Gender}_{ij} + u_{\text{Family}} + u_{\text{Family} \times \text{Gender}} + \epsilon_{ij}. \]

Interpretation

\(\beta_0\) and \(\beta_1\) are fixed effects
\(u_{\text{Family}}\) allows different families to have different baseline levels
\(u_{\text{Family} \times \text{Gender}}\) allows the gender effect to vary by family
\(\epsilon_{ij}\) is the residual error

This model explicitly acknowledges that observations from the same family may be correlated.

Why the Mixed Model Is Different

The mixed model is different from the fixed-effects model because it recognizes that:

families are a source of variability
part of the total variation in height comes from between-family differences
part of the total variation comes from within-family differences

That is why the mixed model gives more realistic standard errors and can lead to different conclusions.

The Most Important Output Tables

When we first use PROC MIXED, the output can feel overwhelming. The goal is not to read every table in detail. Instead, focus on a few key tables.

17.4 1. Covariance Parameter Estimates

This table reports estimated variance components for the random part of the model.

Typical rows include:

Family
Family*Gender
Residual

17.4.1 How to interpret this table

A larger variance component means that source contributes more variability
If the Family variance is large, families differ substantially in baseline height
If the Family*Gender variance is large, the gender difference varies across families
The residual variance reflects remaining unexplained within-family variation

Key Interpretation

Covariance parameter estimates tell us where the variability comes from.

17.5 2. Tests of Fixed Effects

This table tests the fixed-effects part of the model.

In the example above, the most important fixed effect is usually:

Gender

This table answers questions such as:

Is there evidence of an overall gender difference in height?
After accounting for family-level variation, does gender still matter?

17.5.1 Why this matters

The p-values here can differ from those in a fixed-effects model because the mixed model adjusts for the dependence structure in the data.

17.6 3. Fit Statistics

This table often includes quantities such as:

\(-2\) log likelihood
AIC
BIC

These are useful when comparing competing models.

17.6.1 General interpretation

Smaller values often indicate better fit, especially when comparing models fit to the same dataset
AIC and BIC are particularly useful when deciding whether adding random effects improves the model

If two models are fitted to the same data, and one has clearly smaller AIC and BIC, that model is usually preferred from a model-fit perspective.

17.7 Why Modeling Correlation Changes Inference

One of the biggest lessons from mixed models is this:

if you ignore correlation, your inference may change.

In practice, adding random effects can change:

standard errors
test statistics
p-values
confidence intervals

This does not necessarily mean the fixed-effect estimate changes dramatically. Sometimes the estimate is similar, but the uncertainty around it changes.

That is one reason mixed models are so important.

17.8 Comparing a Fixed-Effects Model and a Mixed Model

A useful way to learn mixed models is to compare the two approaches.

Fixed-effects model

PROC MIXED DATA=heights;
    CLASS Family Gender;
    MODEL Height = Gender Family Family*Gender;
RUN;

Mixed-effects model

PROC MIXED DATA=heights;
    CLASS Family Gender;
    MODEL Height = Gender;
    RANDOM Family Family*Gender;
RUN;

Conceptual comparison

The fixed-effects model treats families as specific categories of direct interest
The mixed-effects model treats family as a source of random variability
The mixed-effects model is more natural when families are thought of as sampled from a larger population

17.9 Another Example: Repeated Measures

Now return to the repeated-measures dataset from the previous lecture.

DATA performance;
    INPUT id $ age group $ trial score;
    DATALINES;
SY 34 A 1 14.3
SY 34 A 2 21.4
SY 34 A 3 27.6
SY 34 A 4 31.1
SY 34 A 5 33.2
WL 33 A 1 13.2
WL 33 A 2 21.4
WL 33 A 3 23.3
WL 33 A 4 30.0
WL 33 A 5 38.6
ZN 43 B 1 15.9
ZN 43 B 2 23.4
ZN 43 B 3 22.0
ZN 43 B 4 29.0
ZN 43 B 5 33.6
;
RUN;

A basic random intercept model is

PROC MIXED DATA=performance;
    CLASS id group;
    MODEL score = trial group;
    RANDOM INTERCEPT / SUBJECT=id;
RUN;

Intrepreation of the model

trial measures the average trend across repeated trials
group compares group A and group B
RANDOM INTERCEPT / SUBJECT=id; allows each subject to have their own baseline

17.9.1 What output should students focus on?

Again, focus on three things:

covariance parameter estimates
tests of fixed effects
fit statistics

17.9.2 How to Explain Results in Words?

Students often know how to read p-values but struggle to explain mixed-model results in plain language. Here are some useful sentence patterns.

For a fixed effect

“After accounting for family-level variability, there is evidence of a gender difference in height.”
“After accounting for repeated measurements within subjects, score tends to increase with trial.”

For a random effect

“There is substantial between-family variability in baseline height.”
“There is meaningful variation across subjects in their baseline scores.”

17.9.3 For model comparison

“The mixed model fits better than the fixed-effects model based on smaller fit statistics.”

17.10 In-Class Questions

In the code below, which part specifies the fixed effects?

PROC MIXED DATA=performance;
    CLASS id group;
    MODEL score = trial group;
    RANDOM INTERCEPT / SUBJECT=id;
RUN;

1. CLASS id group;
1. MODEL score = trial group;
1. RANDOM INTERCEPT / SUBJECT=id;
1. PROC MIXED DATA=performance;

What does the RANDOM statement do?

1. It creates a new variable
1. It defines the variance structure due to clustering
1. It removes dependence from the data
1. It converts a continuous variable into a categorical variable

If a mixed model and a fixed-effects model give different p-values, which is the most likely explanation?

1. The raw data changed
1. The mixed model accounts for correlation
1. The response variable changed units
1. SAS made an error

17.11 What to Remember from This Lecture

The main lesson of this second lecture is:

PROC MIXED allows us to estimate both the fixed-effects part and the random-effects part of a model.

When you read the output, focus on:

the fixed effects
the random-effect variance components
the fit statistics

These three pieces usually tell the main story.

Key Takeaways

PROC MIXED is used to fit linear mixed models in SAS
CLASS identifies categorical variables
MODEL specifies the fixed-effects part
RANDOM specifies the random-effects part
Covariance parameter estimates show how variability is split across sources
Tests of fixed effects tell us whether predictors are associated with the response
Modeling correlation can change standard errors, p-values, and conclusions

17.1 Introduction

17.2 Why PROC MIXED?

17.3 Example Dataset: Family Heights

17.3.1 Step 1: A Fixed-Effects Model

17.3.2 Step 2: A Mixed-Effects Model

17.4 1. Covariance Parameter Estimates

17.4.1 How to interpret this table

17.5 2. Tests of Fixed Effects

17.5.1 Why this matters

17.6 3. Fit Statistics

17.6.1 General interpretation

17.7 Why Modeling Correlation Changes Inference

17.8 Comparing a Fixed-Effects Model and a Mixed Model

17.9 Another Example: Repeated Measures

17.9.1 What output should students focus on?

17.9.2 How to Explain Results in Words?

17.9.3 For model comparison

17.10 In-Class Questions

17.11 What to Remember from This Lecture

17.2 Why `PROC MIXED`?