Data Analysis Using SAS
Weight: 20% of final grade
Last modified: April 6, 2026
Due Date: Apr 27, 2026
๐ฏ Project Overview
The goal of the final project is to apply statistical methods and SAS programming skills learned throughout the course to conduct a complete data analysis pipeline.
Students will:
- Work with a real-world open dataset
- Perform data cleaning, visualization, and modeling
- Use SAS procedures and macros
- Communicate results clearly through a written report
Group Information
| Group | Member | Topic | |
|---|---|---|---|
| 1 | Zong Yang | Financial Metric Transmission Mechanism within Coca-Colaโs Income Statement | |
| 2 | Mangsa Limbu Pheudin, Shaswat Bhushan Jangam, Allswell Akomanyi-Addo | ||
| 3 | Wenpu Ma | ||
| 4 | Vu, Ha | ||
| 5 | Umma Hafsah Himu | ||
| 6 | Kalen Jinnah, Ivanna Poliashenko, Sierra Doherty | ||
| 7 | Zhe Zhong | Bank Marketing Dataset | |
| 8 | Andrew Krause | Heart Disease Dataset | |
| 9 | Taiwo Ayeni | Cardiovascular Risk Factors in the Framingham Heart Study | |
| 10 | Maya Creary | Is there a statistically significant difference in the โInattentiveโ vs. โHyperactiveโ symptom scores between male and female subjects? | |
| 11 | Seoyoung Kim | CDC Disability data | |
| 12 | Sifan You, Boshu Zhang | Relationship between carโs weight and its fuel efficiency |
๐ Data Requirement
You must use a public/open dataset.
Recommended sources:
- UCI Machine Learning Repository
- Kaggle (public datasets only)
- Data.gov
- CDC / WHO datasets
- Built-in SAS datasets (e.g.,
sashelp.cars,sashelp.heart, etc.)
Requirements:
- At least 100 observations
- At least 3โ5 variables
- Must include:
- At least one categorical variable
- At least one numerical variable
Group:
You can form group of 2โ3 students. If you prefer to work alone, that is also acceptable. PhD students MUST work individually.
Project Components
Your project must include the following components:
Research Question (10%)
Clearly state:
- What question are you trying to answer?
- Why is it interesting or important?
Examples:
- Does treatment A outperform treatment B?
- What factors affect income?
- Is there an association between two categorical variables?
Data Cleaning & Preparation (15%)
- Import dataset into SAS
- Handle:
- Missing values
- Data types
- Outliers (if necessary)
Suggested tools:
DATAstepPROC IMPORTPROC FORMAT
Exploratory Data Analysis (EDA) (15%)
Use SAS to explore the data:
- Summary statistics:
PROC MEANSPROC FREQ
- Visualizations:
PROC SGPLOT
Examples:
- Histograms
- Boxplots
- Scatterplots
Statistical Analysis (30%)
You must include at least TWO of the following methods:
- Chi-square test (
PROC FREQ) - Two-sample t-test (
PROC TTEST) - ANOVA (
PROC ANOVAorPROC GLM) - Regression (
PROC REG) - Correlation (
PROC CORR)
You should:
- State hypotheses
- Report test statistics and p-values
- Interpret results in context
SAS Macro (15%)
You must create at least one SAS macro.
Examples:
- A macro for:
- Summary statistics
- Running multiple regressions
- Automating hypothesis tests
Your macro must:
- Take at least 2 input arguments
- Be reusable
- Be clearly documented
Example structure:
%MACRO analysis(data, var);
PROC MEANS DATA=&data;
VAR &var;
RUN;
%MEND;Interpretation & Conclusion (15%)
You must:
- Explain results in plain language
- Connect findings to your research question
- Discuss:
- Limitations
- Possible improvements
๐ Deliverables
1. Written Report (PDF)
Length: 8โ10 pages (excluding references and appendices). You may put additional Figures and Tables in the appendix.
Include:
- Introduction
- Data description
- Methods
- Results
- Conclusion
2. SAS Code File (.sas)
Must include:
- Clean and well-commented code
- Macro implementation
3. (Optional Bonus +5%)
- Visualization dashboard
- Additional modeling (e.g., interaction, model comparison)
- Advanced macro usage
๐ Grading Breakdown
| Component | Points |
|---|---|
| Research Question | 10 |
| Data Preparation | 15 |
| EDA | 15 |
| Statistical Analysis | 30 |
| SAS Macro | 15 |
| Interpretation | 15 |
| Total | 100 (20% course weight) |
- You must write your own SAS code
- Collaboration is allowed for discussion, but not for code sharing between the groups
- Plagiarism will result in zero credit
- Pick dataset early
- Perform EDA first
- Decide appropriate statistical methods
- Write SAS code
- Wrap repeated tasks into macros
- Write report last
This project integrates:
- Statistical thinking
- SAS programming
- Reproducible analysis
- Communication skills
This is designed to simulate a real-world data analysis task.