17 Two-Sample t-Test and ANOVA in SAS
Learning Objectives
In this hands-on session, we will practice applying two-sample t-tests and one-way ANOVA in SAS using a built-in dataset from SASHELP. The goal is to help you develop a systematic workflow for comparing group means and interpreting the results.
By the end of this activity, you should be able to:
- Understand when to use a two-sample t-test.
- Understand when to use one-way ANOVA.
- Identify the response variable and grouping variable in SAS code.
- Write SAS code for two-sample t-tests and ANOVA.
- Interpret p-values, assumptions, and post-hoc comparisons.
17.1 Guideline: How to Analyze Group Mean Comparisons in SAS
Based on experience, a reliable way to solve mean-comparison problems is to follow three steps:
Step 1: Understand the data and the question
- Identify the response variable
- Identify the grouping variable
- Determine how many groups are being compared
Step 2: Choose the correct inferential method
- If there are two groups, use a two-sample t-test
- If there are three or more groups, use one-way ANOVA
Step 3: Write SAS code and interpret the output
- Run the statistical procedure
- Check assumptions when appropriate
- Interpret the p-value in context
- For ANOVA, if the global test is significant, run a multiple comparison procedure
💡 This same workflow is also useful when analyzing grouped data in R or Python.
17.2 Practice Examples in this Activity
To practice these steps, in this activity, we will:
- Use a two-sample t-test to compare city fuel efficiency between cars from Asia and the USA
- Use one-way ANOVA to compare horsepower across different vehicle types
Eventually, you will see that the two-sample t-test is a special case of ANOVA.
17.3 Dataset for this exercise: SASHELP.CARS
In this activity, we use the built-in SAS dataset SASHELP.CARS.
This dataset contains information on many car models and includes variables such as:
Origin: region where the car was producedType: vehicle typeMPG_City: city miles per gallonMPG_Highway: highway miles per gallonHorsepower: engine horsepowerWeight: vehicle weight
Since SASHELP.CARS is built into SAS OnDemand / SAS Studio, no data import is required.
17.4 First Practice: Two-Sample t-Test
In this first example, we compare the city fuel efficiency (MPG_City) of cars from Asia and the USA.
17.4.1 Step 1: Explore the Dataset
First, look at the structure of the dataset and preview a few rows.
PROC CONTENTS DATA=SASHELP.CARS;
RUN;
PROC PRINT DATA=SASHELP.CARS (OBS=10);
RUN;At this stage:
- Focus on identifying useful variables
- Decide which variable is the response
- Decide which variable defines the groups
17.4.2 Step 2: Create a Dataset with Two Groups
Since we only want to compare Asia and USA, we subset the built-in dataset.
DATA CARS_SUB;
SET SASHELP.CARS;
IF ORIGIN IN ("Asia", "USA");
RUN;Key idea:
MPG_Cityis the response variableOriginis the grouping variable- There are two groups, so the correct method is a two-sample t-test
### Step 3: Run the Two-Sample t-Test
The hypotheses are
\[ H_0: \mu_{\text{Asia}} = \mu_{\text{USA}} \]
and
\[ H_1: \mu_{\text{Asia}} \neq \mu_{\text{USA}}. \]
Use the following code:
PROC TTEST DATA=CARS_SUB;
CLASS ORIGIN;
VAR MPG_CITY;
RUN;17.4.3 Code Description
CLASS ORIGINdefines the grouping variableVAR MPG_CITYdefines the response variablePROC TTESTperforms the two-sample t-test
17.4.4 Step 4: Visualize the Two Groups
It is often helpful to examine the two groups visually.
PROC SGPLOT DATA=CARS_SUB;
TITLE "City MPG by Car Origin";
VBOX MPG_CITY / CATEGORY=ORIGIN;
RUN;17.4.5 In-Class Questions
- Which group appears to have the larger mean city MPG?
- What is the null hypothesis being tested?
- If the p-value is smaller than 0.05, what conclusion should we make?
- Does SAS report only one t-test result, or more than one method?
17.5 Second Practice: One-Way ANOVA
In this second example, we compare horsepower across different vehicle types.
17.5.1 Step 1: Understand the Problem
Now the response variable is
Horsepower
and the grouping variable is
Type
Since Type has more than two levels, we use one-way ANOVA.
17.5.2 Step 2: Compute Summary Statistics
Before fitting the ANOVA model, compute summary statistics by group.
PROC MEANS DATA=SASHELP.CARS N MEAN STD;
CLASS TYPE;
VAR HORSEPOWER;
RUN;This code helps you inspect:
- sample size in each group
- mean horsepower in each group
- standard deviation in each group
17.5.3 Step 3: Visualize the Groups
PROC SGPLOT DATA=SASHELP.CARS;
TITLE "Horsepower by Vehicle Type";
VBOX HORSEPOWER / CATEGORY=TYPE;
RUN;17.5.4 Step 4: Write the Global Hypotheses
The one-way ANOVA hypotheses are
\[ H_0: \mu_1 = \mu_2 = \cdots = \mu_k \]
versus
\[ H_1: \text{At least one group mean differs.} \]
### Step 5: Fit the ANOVA Model
PROC ANOVA DATA=SASHELP.CARS;
CLASS TYPE;
MODEL HORSEPOWER = TYPE;
RUN;17.5.5 Code Description
CLASS TYPEdefines the grouping variableMODEL HORSEPOWER = TYPEtests whether mean horsepower differs across vehicle types- The global ANOVA test is based on the F-statistic
### In-Class Questions
- What does the global ANOVA test tell us?
- If the ANOVA p-value is smaller than 0.05, what can we conclude?
- If the ANOVA test is significant, do we immediately know which groups differ?
17.6 Assumption Checking for ANOVA
One important ANOVA assumption is homogeneity of variances.
We can check this using Levene’s test.
PROC ANOVA DATA=SASHELP.CARS;
CLASS TYPE;
MODEL HORSEPOWER = TYPE;
MEANS TYPE / HOVTEST=LEVENE WELCH;
RUN;17.6.1 Code Description
HOVTEST=LEVENErequests Levene’s test for equal variancesWELCHrequests Welch’s ANOVA, which is useful when the equal variance assumption is questionable
17.6.2 In-Class Questions
- What does Levene’s test check?
- If Levene’s test has a p-value below 0.05, what does that suggest?
- Why might Welch’s ANOVA be preferable in that case?
17.7 Multiple Comparison After ANOVA
If the global ANOVA test is significant, we often want to know:
Which groups are different?
We can do this with a multiple comparison procedure.
17.7.1 Example: Tukey Test
PROC ANOVA DATA=SASHELP.CARS;
CLASS TYPE;
MODEL HORSEPOWER = TYPE;
MEANS TYPE / TUKEY;
RUN;17.7.2 Code Description
TUKEYrequests Tukey’s multiple comparison procedure- This compares all pairs of group means while controlling the overall error rate
17.7.3 In-Class Questions
- Why should we only interpret Tukey results after the global ANOVA test is significant?
- What is the purpose of a post-hoc test?
- How is this different from simply running many separate t-tests?
17.8 Compare the Two Methods
At this point, compare the two procedures used in this activity.
17.8.1 Two-Sample t-Test
Used when:
- there are exactly two groups
- the goal is to compare two group means
Example in this activity:
MPG_Cityfor Asia vs USA
17.8.2 One-Way ANOVA
Used when:
- there are three or more groups
- the goal is to determine whether at least one mean differs
Example in this activity:
Horsepoweracross vehicle types
17.8.3 Important Connection
When there are only two groups, the two-sample t-test and one-way ANOVA test the same null hypothesis.
So:
The two-sample t-test is a special case of ANOVA.
17.9 Suggested Practice Tasks
Work through the following during class.
17.9.1 Task 1
Compare MPG_Highway between Asia and Europe cars.
Suggested steps:
- Create a subset dataset
- Identify the response and grouping variables
- Run a two-sample t-test
- Draw a boxplot
- Interpret the p-value
17.9.2 Task 2
Test whether vehicle weight differs by Origin.
Suggested steps:
- Compute summary statistics
- Draw a boxplot
- Fit the ANOVA model
- Check the p-value
- If needed, perform a multiple comparison test
17.10 Summary
By following the group-comparison workflow, you have practiced how to:
- Identify whether a problem involves two groups or multiple groups
- Choose between a two-sample t-test and one-way ANOVA
- Write SAS code for both procedures
- Interpret p-values and global test results
- Check assumptions and perform post-hoc comparisons
This same workflow applies directly to:
- regression with categorical predictors,
- factorial ANOVA,
- linear models,
- and many later topics in data analysis.
Once you master this pattern, mean-comparison problems become much more systematic and easier to solve.