17 Linear Mixed Effect Models II
Learning Objectives
By the end of this lecture, you should be able to:
- Implement linear mixed models in SAS using
PROC MIXED
- Understand the roles of the CLASS, MODEL, and RANDOM statements
- Interpret key output tables, including:
- covariance parameter estimates
- tests of fixed effects
- fit statistics
- Compare fixed-effects models and mixed-effects models
- Explain why modeling correlation can change standard errors, p-values, and conclusions
17.1 Introduction
In the previous lecture, we introduced the conceptual framework of linear mixed models:
\[ Y = X\beta + Z\gamma + \epsilon, \]
where:
- \(X\beta\) represents the fixed-effects part of the model
- \(Z\gamma\) represents the random-effects part of the model
- \(\epsilon\) is the residual error
In this lecture, we focus on how to fit and interpret these models in SAS using PROC MIXED.
Last time, the main question was:
Why do we need mixed models?
This time, the main question is:
How do we fit, read, and interpret mixed models in SAS?
17.2 Why PROC MIXED?
Recall the roles of some common SAS procedures:
PROC REGfits ordinary regression models and assumes independent observationsPROC GLMhandles fixed-effects models, including ANOVA-style models with categorical predictorsPROC MIXEDextends these ideas to data with dependence, clustering, or repeated measurements
Typical use cases include:
- clustered data, such as students within schools or patients within hospitals
- repeated-measures data, such as multiple observations on the same subject
A General Template for PROC MIXED
A basic mixed model in SAS has the form
PROC MIXED DATA=dataset;
CLASS categorical_variables;
MODEL response = fixed_effects;
RANDOM random_effects / SUBJECT=cluster;
RUN;How to read this syntax?
CLASStells SAS which variables are categoricalMODELspecifies the fixed-effects partRANDOMspecifies the random-effects partSUBJECT=identifies the clustering variable
A mixed model has two parts:
- A fixed-effects part, which describes the average relationship
- A random-effects part, which accounts for dependence among observations from the same subject or cluster
17.3 Example Dataset: Family Heights
To keep the ideas concrete, consider the following family-height data.
DATA heights;
INPUT Family Gender $ Height @@;
DATALINES;
1 F 67 1 F 66 1 F 64 1 M 71 1 M 72
2 F 63 2 F 67 2 M 69 2 M 68 2 M 70
3 F 63 3 M 64
4 F 67 4 F 66 4 M 67 4 M 67 4 M 69
;
RUN;In this dataset:
Heightis the response variableGenderis a categorical explanatory variableFamilyidentifies the cluster
So the natural grouping structure is:
observations within the same family may be correlated
This means a standard fixed-effects model that assumes independence may not be appropriate.
17.3.1 Step 1: A Fixed-Effects Model
We begin with a model that treats family as a fixed effect.
PROC MIXED DATA=heights;
CLASS Family Gender;
MODEL Height = Gender Family Family*Gender;
RUN;What this model does
This model is similar to a fixed-effects ANOVA model:
Genderis treated as a fixed effectFamilyis also treated as a fixed effectFamily*Genderallows the gender difference to vary across families
Interpretation
This approach is useful if:
- the specific families in the dataset are the only ones of interest
- you do not want to generalize beyond these observed families
However, this model does not use the idea that families may be viewed as a random sample from a larger population of families.
If family is really a random grouping factor, treating it only as fixed may not be the best way to represent the data structure.
17.3.2 Step 2: A Mixed-Effects Model
Now we fit a mixed model that treats family-related effects as random.
PROC MIXED DATA=heights;
CLASS Family Gender;
MODEL Height = Gender;
RANDOM Family Family*Gender;
RUN;This model can be written conceptually as
\[ Y_{ij} = \beta_0 + \beta_1 \, \text{Gender}_{ij} + u_{\text{Family}} + u_{\text{Family} \times \text{Gender}} + \epsilon_{ij}. \]
Interpretation
- \(\beta_0\) and \(\beta_1\) are fixed effects
- \(u_{\text{Family}}\) allows different families to have different baseline levels
- \(u_{\text{Family} \times \text{Gender}}\) allows the gender effect to vary by family
- \(\epsilon_{ij}\) is the residual error
This model explicitly acknowledges that observations from the same family may be correlated.
Why the Mixed Model Is Different
The mixed model is different from the fixed-effects model because it recognizes that:
- families are a source of variability
- part of the total variation in height comes from between-family differences
- part of the total variation comes from within-family differences
That is why the mixed model gives more realistic standard errors and can lead to different conclusions.
The Most Important Output Tables
When we first use PROC MIXED, the output can feel overwhelming. The goal is not to read every table in detail. Instead, focus on a few key tables.
17.4 1. Covariance Parameter Estimates
This table reports estimated variance components for the random part of the model.
Typical rows include:
FamilyFamily*GenderResidual
17.4.1 How to interpret this table
- A larger variance component means that source contributes more variability
- If the
Familyvariance is large, families differ substantially in baseline height - If the
Family*Gendervariance is large, the gender difference varies across families - The residual variance reflects remaining unexplained within-family variation
Covariance parameter estimates tell us where the variability comes from.
17.5 2. Tests of Fixed Effects
This table tests the fixed-effects part of the model.
In the example above, the most important fixed effect is usually:
Gender
This table answers questions such as:
- Is there evidence of an overall gender difference in height?
- After accounting for family-level variation, does gender still matter?
17.5.1 Why this matters
The p-values here can differ from those in a fixed-effects model because the mixed model adjusts for the dependence structure in the data.
17.6 3. Fit Statistics
This table often includes quantities such as:
- \(-2\) log likelihood
- AIC
- BIC
These are useful when comparing competing models.
17.6.1 General interpretation
- Smaller values often indicate better fit, especially when comparing models fit to the same dataset
- AIC and BIC are particularly useful when deciding whether adding random effects improves the model
If two models are fitted to the same data, and one has clearly smaller AIC and BIC, that model is usually preferred from a model-fit perspective.
17.7 Why Modeling Correlation Changes Inference
One of the biggest lessons from mixed models is this:
if you ignore correlation, your inference may change.
In practice, adding random effects can change:
- standard errors
- test statistics
- p-values
- confidence intervals
This does not necessarily mean the fixed-effect estimate changes dramatically. Sometimes the estimate is similar, but the uncertainty around it changes.
That is one reason mixed models are so important.
17.8 Comparing a Fixed-Effects Model and a Mixed Model
A useful way to learn mixed models is to compare the two approaches.
Fixed-effects model
PROC MIXED DATA=heights;
CLASS Family Gender;
MODEL Height = Gender Family Family*Gender;
RUN;Mixed-effects model
PROC MIXED DATA=heights;
CLASS Family Gender;
MODEL Height = Gender;
RANDOM Family Family*Gender;
RUN;Conceptual comparison
- The fixed-effects model treats families as specific categories of direct interest
- The mixed-effects model treats family as a source of random variability
- The mixed-effects model is more natural when families are thought of as sampled from a larger population
17.9 Another Example: Repeated Measures
Now return to the repeated-measures dataset from the previous lecture.
DATA performance;
INPUT id $ age group $ trial score;
DATALINES;
SY 34 A 1 14.3
SY 34 A 2 21.4
SY 34 A 3 27.6
SY 34 A 4 31.1
SY 34 A 5 33.2
WL 33 A 1 13.2
WL 33 A 2 21.4
WL 33 A 3 23.3
WL 33 A 4 30.0
WL 33 A 5 38.6
ZN 43 B 1 15.9
ZN 43 B 2 23.4
ZN 43 B 3 22.0
ZN 43 B 4 29.0
ZN 43 B 5 33.6
;
RUN;A basic random intercept model is
PROC MIXED DATA=performance;
CLASS id group;
MODEL score = trial group;
RANDOM INTERCEPT / SUBJECT=id;
RUN;Intrepreation of the model
trialmeasures the average trend across repeated trialsgroupcompares group A and group BRANDOM INTERCEPT / SUBJECT=id;allows each subject to have their own baseline
17.9.1 What output should students focus on?
Again, focus on three things:
- covariance parameter estimates
- tests of fixed effects
- fit statistics
17.9.2 How to Explain Results in Words?
Students often know how to read p-values but struggle to explain mixed-model results in plain language. Here are some useful sentence patterns.
For a fixed effect
- “After accounting for family-level variability, there is evidence of a gender difference in height.”
- “After accounting for repeated measurements within subjects, score tends to increase with trial.”
For a random effect
- “There is substantial between-family variability in baseline height.”
- “There is meaningful variation across subjects in their baseline scores.”
17.9.3 For model comparison
- “The mixed model fits better than the fixed-effects model based on smaller fit statistics.”
17.10 In-Class Questions
In the code below, which part specifies the fixed effects?
PROC MIXED DATA=performance;
CLASS id group;
MODEL score = trial group;
RANDOM INTERCEPT / SUBJECT=id;
RUN;CLASS id group;
MODEL score = trial group;
RANDOM INTERCEPT / SUBJECT=id;
PROC MIXED DATA=performance;
What does the RANDOM statement do?
- It creates a new variable
- It defines the variance structure due to clustering
- It removes dependence from the data
- It converts a continuous variable into a categorical variable
If a mixed model and a fixed-effects model give different p-values, which is the most likely explanation?
- The raw data changed
- The mixed model accounts for correlation
- The response variable changed units
- SAS made an error
17.11 What to Remember from This Lecture
The main lesson of this second lecture is:
PROC MIXEDallows us to estimate both the fixed-effects part and the random-effects part of a model.
When you read the output, focus on:
- the fixed effects
- the random-effect variance components
- the fit statistics
These three pieces usually tell the main story.
PROC MIXEDis used to fit linear mixed models in SASCLASSidentifies categorical variablesMODELspecifies the fixed-effects partRANDOMspecifies the random-effects part- Covariance parameter estimates show how variability is split across sources
- Tests of fixed effects tell us whether predictors are associated with the response
- Modeling correlation can change standard errors, p-values, and conclusions