5Week 4: ANOVA Decomposition, Overall F Test, and Nested Models
In this week, we study the ANOVA decomposition for linear regression, the overall significance test, and the comparison of nested models. These ideas connect the geometry of least squares with the inferential tools developed in previous weeks.
5.1 Learning Objectives
By the end of this week, students should be able to:
define the total, regression, and error sums of squares;
explain the ANOVA decomposition in regression with an intercept;
interpret the degrees of freedom associated with SST, SSR, and SSE;
perform the overall \(F\) test for regression;
compare nested linear models using extra sums of squares;
interpret ANOVA tables produced by statistical software.
From Week 2, we know that \(\hat{\mathbf{Y}}\) and \(\mathbf{e}\) are orthogonal. From Week 3, we know that this leads to useful distributional results for inference.
This week, we organize these ideas into the ANOVA framework.
5.4 Total, Explained, and Unexplained Variation
A central question in regression is:
How much of the variation in the response can be explained by the model?
To answer this, we decompose the total variation in \(\mathbf{Y}\) into:
variation explained by regression;
variation left unexplained by the model.
When the model includes an intercept, this decomposition takes a particularly simple and important form.
5.5 Total Sum of Squares
Assume the model contains an intercept.
The total variation in the response is measured by the total sum of squares
Because the model contains an intercept, the vector \(\bar{Y}\mathbf{1}\) lies in the column space of \(\mathbf{X}\). Hence both \(\hat{\mathbf{Y}}\) and \(\bar{Y}\mathbf{1}\) lie in the model space, so their difference also lies in the model space.
Since the residual vector \(\mathbf{e}\) is orthogonal to the model space, we have
So the ANOVA comparison of nested models is another way of expressing the general \(F\) test from Week 3.
In practice, this is one of the most common uses of regression ANOVA tables.
5.20 Sequential and Partial Sums of Squares
In multiple regression, sums of squares can be defined in different ways depending on what is being adjusted for.
Two common ideas are:
sequential sums of squares: terms are added in a specified order;
partial sums of squares: each term is tested after adjusting for the others.
This distinction becomes important when predictors are correlated.
In this course, the main conceptual priority is to understand the extra sum of squares principle. Details of Type I, Type II, and Type III sums of squares can be introduced later if needed.
It measures the proportion of total variation explained by the regression model.
Its values lie between 0 and 1.
5.22 Interpretation of R Squared
If \(R^2\) is close to 1, the model explains a large proportion of the variation in the response.
If \(R^2\) is close to 0, the model explains little of the variation.
However, \(R^2\) alone does not guarantee that the model is appropriate. A high \(R^2\) does not ensure that assumptions are satisfied, and a low \(R^2\) does not necessarily imply the model is useless.
5.23 Adjusted R Squared
Because \(R^2\) never decreases when additional predictors are added, it can overstate improvement.
summary(fit) reports the overall \(F\) statistic, \(R^2\), and adjusted \(R^2\).
Students should understand that these outputs are not separate topics. They are all built from the same least squares geometry and distribution theory.
5.34 In-Class Discussion Questions
Why does the ANOVA decomposition require an intercept for the usual SST = SSR + SSE identity?
Why must \(\mathrm{SSE}_F \le \mathrm{SSE}_R\) for nested models?
What does the overall \(F\) test tell us that individual \(t\) tests do not?
Why can \(R^2\) be misleading if used alone?
5.35 Practice Problems
5.36 Conceptual
Explain the meaning of SST, SSR, and SSE in words.
Explain why the regression degrees of freedom are \(p-1\) when the model includes an intercept.
Explain the difference between the overall \(F\) test and a test for a single coefficient.
5.37 Computational
Suppose a regression model with intercept has:
\(n=20\),
\(p=4\),
\(\mathrm{SST}=100\),
\(\mathrm{SSE}=40\).
Compute:
\(\mathrm{SSR}\),
the degrees of freedom for regression and error,
\(\mathrm{MSR}\),
\(\mathrm{MSE}\),
the overall \(F\) statistic,
\(R^2\).
5.38 Nested Model Problem
A reduced model has
\[
\mathrm{SSE}_R = 120
\]
with \(p_R = 3\), and a full model has
\[
\mathrm{SSE}_F = 90
\]
with \(p_F = 5\).
If \(n=30\), compute the nested-model \(F\) statistic.
5.39 Suggested Homework
Complete the following tasks:
prove the decomposition \(\mathrm{SST}=\mathrm{SSR}+\mathrm{SSE}\) when the model includes an intercept;
derive the overall \(F\) statistic from the ANOVA decomposition;
fit a regression model in R and reproduce the ANOVA table by hand;
compare two nested models using an extra sum of squares test;
interpret both \(R^2\) and adjusted \(R^2\) for a chosen dataset.
5.40 Summary
In this week, we developed the ANOVA framework for linear regression.
the comparison of nested models through extra sums of squares;
the interpretation of \(R^2\) and adjusted \(R^2\).
Next week, a natural continuation is to study multiple regression in greater depth, including interpretation of partial regression coefficients and multicollinearity, or to move into matrix-based general linear hypotheses and estimability, depending on the course emphasis.