10Week 9: General Linear Hypotheses, Contrasts, and Estimability
In this week, we bring together many earlier ideas into the general framework of linear inference in regression models. The focus is on linear functions of parameters, contrasts, general linear hypotheses, and the important issue of estimability. This week helps students move from coefficient-by-coefficient thinking to a broader matrix-based understanding of inference in linear models.
10.1 Learning Objectives
By the end of this week, students should be able to:
define a linear function of regression parameters;
explain what a contrast is and why contrasts are useful;
formulate general linear hypotheses in matrix form;
carry out inference for linear combinations of parameters;
explain the meaning of estimability in linear models;
distinguish between full-rank and rank-deficient settings;
interpret software output for linear hypothesis tests and contrasts.
10.2 Reading
Recommended reading for this week:
Seber and Lee:
sections on linear functions of parameters
general linear hypotheses
estimability and rank-deficient models
Montgomery, Peck, and Vining:
sections on tests for combinations of parameters
qualitative predictors and comparisons among means
extra sum of squares and related inference
10.3 Why This Week Matters
In earlier weeks, we tested individual coefficients and compared nested models.
But many important questions in regression are not of the form:
is one coefficient equal to zero?
Instead, they are questions such as:
are two slopes equal?
is the average of two treatment effects equal to a third?
are all group means the same?
is a certain interaction effect absent?
does a linear combination of parameters equal a specified value?
These are all questions about linear functions of the parameter vector.
This week provides the general language for expressing and testing such questions.
This is the general matrix form of the regression \(F\) test.
10.15 Connection With Nested Models
The general linear hypothesis test is equivalent to comparing a reduced model and a full model when the reduced model is obtained by imposing linear restrictions.
So the extra sum of squares \(F\) test from Week 4 is a special case of the general linear hypothesis test.
This is an important unifying idea.
10.16 When r = 1
If there is only one restriction, then the general \(F\) test reduces to the square of a \(t\) test.
That is, when \(r=1\),
\[
F = T^2.
\]
So single-parameter inference and multi-parameter inference are part of the same framework.
10.17 Matrix Formulation of Contrasts
If we are interested in several contrasts at once, we can stack them into a matrix.
Then the data cannot distinguish between these two parameter vectors.
So a particular coefficient may not be uniquely meaningful.
However, some combinations of coefficients may still be uniquely determined by the mean structure. Those combinations are estimable.
10.20 Definition of Estimability
A linear function
\[
a^\top \boldsymbol{\beta}
\]
is estimable if there exists a vector \(t\) such that
\[
a^\top = t^\top \mathbf{X}.
\]
Equivalently, \(a\) must lie in the row space of \(\mathbf{X}\).
This condition ensures that the target quantity depends only on the mean vector \(\mathbf{X}\boldsymbol{\beta}\), and not on the particular parameterization used.
10.21 Interpretation of Estimability
Estimability means that the quantity is determined by the model’s observable mean structure.
If a linear function is not estimable, then different parameter vectors producing the same fitted mean can give different values of that function.
So no unbiased linear estimator can uniquely recover it.
10.22 Example With a Factor Model
Suppose we write a one-way mean model as
\[
Y_{ij} = \mu + \tau_i + \varepsilon_{ij},
\]
for groups \(i=1,\dots,g\).
If we include all group indicators together with an intercept, then the parameters are not uniquely identified, because adding a constant to all \(\tau_i\) and subtracting it from \(\mu\) gives the same mean structure.
In this case:
\(\mu\) alone is not uniquely defined;
\(\tau_i\) alone is not uniquely defined;
but differences such as \(\tau_i - \tau_j\) are estimable.
This is a classic example.
10.23 Estimable Functions in Rank-Deficient Models
Even when the coefficient vector is not unique, fitted values are still unique, provided we project onto the column space of \(\mathbf{X}\).
Likewise, every estimable linear function has a unique value determined by the model.
So regression analysis in rank-deficient settings often focuses on estimable functions rather than on individual raw coefficients.
10.24 Parameterization Matters for Coefficients, but Not for Estimable Functions
A factor can be parameterized in several ways:
treatment coding;
sum-to-zero coding;
cell-means coding.
The individual coefficients change across parameterizations.
But meaningful estimable comparisons, such as differences between group means, do not depend on the coding scheme.
This is a key conceptual lesson.
10.25 Contrasts and Estimability
In many ANOVA-type models, contrasts among means are estimable even when the raw parameter vector is overparameterized.
This is one reason contrasts are so central: they often represent the scientifically meaningful and estimable quantities.
10.26 Least Squares in Rank-Deficient Models
When \(\mathbf{X}\) is rank-deficient, the normal equations do not yield a unique coefficient vector.
Different generalized inverse solutions may produce different coefficient vectors.
However:
the fitted values are unique;
the residual sum of squares is unique;
estimable linear functions have unique least squares estimates.
So the inferential target should be framed carefully.
10.27 Generalized Inverse View
A generalized inverse of \(\mathbf{X}^\top \mathbf{X}\) can be used to write one least squares solution.
This leads to expressions similar to the full-rank case, but students should remember that not every coefficient itself is uniquely meaningful.
What matters most is whether the function of interest is estimable.
10.28 Software and Estimability
Modern software often handles rank deficiency automatically.
It may:
drop aliased columns;
report coefficients as not estimable;
use a default parameterization that makes the fit identifiable.
Students should not interpret every reported coefficient mechanically. They should understand the underlying estimable structure.
10.29 Worked Example With Equality of Slopes
Suppose we fit a regression model with two predictors and want to test whether their coefficients are equal:
vcov() for the covariance matrix of coefficient estimates;
model.matrix() for inspecting the design matrix;
anova() for nested-model versions of linear hypothesis tests.
Even when software provides default hypothesis tests, students should learn to express the hypothesis itself in matrix form. That is often the most important conceptual step.
10.40 A Practical Workflow for Linear Hypotheses
A useful workflow is:
identify the scientific question;
express the target as a linear function or a set of linear restrictions;
write down the vector \(a\) or matrix \(\mathbf{C}\);
compute the estimate and its standard error;
interpret the result on the original scientific scale.
This approach makes regression inference more flexible and more transparent.
10.41 In-Class Discussion Questions
Why is a contrast defined by coefficients summing to zero?
Why are some functions estimable even when the full parameter vector is not uniquely identifiable?
Why is the general linear hypothesis framework more powerful than testing coefficients one by one?
Why should interpretation focus on estimable functions rather than arbitrary parameterizations?
10.42 Practice Problems
10.43 Conceptual
Explain the difference between a coefficient and a general linear function of coefficients.
Explain why treatment comparisons are often naturally expressed as contrasts.
Compute the estimate of \(a^\top \boldsymbol{\beta}\).
Compute its estimated variance.
Write the corresponding \(t\) statistic for testing whether \(a^\top \boldsymbol{\beta} = 0\).
10.45 Hypothesis-Matrix Problem
Write the matrix \(\mathbf{C}\) and vector \(\mathbf{d}\) for each hypothesis:
\(H_0: \beta_2 = \beta_3\);
\(H_0: \beta_2 = 0\) and \(\beta_4 = 0\);
\(H_0: \beta_2 + \beta_3 - 2\beta_4 = 1\).
10.46 Suggested Homework
Complete the following tasks:
fit a regression model and test at least two nontrivial linear combinations of coefficients;
write each scientific question first in words and then in matrix form;
compute a confidence interval for one contrast of interest;
fit a model with a factor and interpret at least two group comparisons as contrasts;
write a short reflection explaining why estimability matters in categorical models.
10.47 Summary
In this week, we studied the general framework of linear inference in regression.
We emphasized that:
many meaningful questions involve linear combinations of parameters rather than single coefficients;
contrasts are important special cases of linear functions;
general linear hypotheses unify many common tests;
estimability determines which parameter functions are uniquely learnable from the model;
meaningful inference should focus on estimable functions, especially in rank-deficient settings.
Next week, a natural continuation is to move into analysis of covariance, one-way and two-way ANOVA as special cases of the linear model, or to extend toward generalized least squares and correlated errors, depending on the course emphasis.
10.48 Appendix: Compact Formula Summary
Linear function of parameters:
\[
a^\top \boldsymbol{\beta}.
\]
Estimated variance of its estimator:
\[
\widehat{\mathrm{Var}}(a^\top \hat{\boldsymbol{\beta}})
=
\hat{\sigma}^2 a^\top (\mathbf{X}^\top \mathbf{X})^{-1} a.
\]