Data Analysis Using SAS
Weight: 20% of final grade
Due Date: May 1, 2026
Last modified: Apr 17, 2026
🎯 Project Overview
The goal of the final project is to apply statistical methods and SAS programming skills learned throughout the course to conduct a complete data analysis pipeline.
Students will:
- Work with a real-world open dataset
- Perform data cleaning, visualization, and modelling
- Use SAS procedures and macros
- Communicate results clearly through a written report
Group Information
| Group | Member | Topic |
|---|---|---|
| 1 | Zong Yang | Financial Metric Transmission Mechanism within Coca-Cola’s Income Statement |
| 2 | Mangsa Limbu Pheudin, Shaswat Bhushan Jangam, Allswell Akomanyi-Addo | Analysis of blood lead levels in the treatment of lead-exposed children |
| 3 | Wenpu Ma | Cleveland Heart Disease Data |
| 4 | Vu, Ha | Drug Safety Classification and Risk Analysis of FDA Drug Adverse Reports |
| 5 | Umma Hafsah Himu | Clinical and Demographic Predictors of Heart Disease |
| 6 | Kalen Jinnah, Ivanna Poliashenko | Baseball bat speed Analysis |
| 7 | Zhe Zhong | Bank Marketing Dataset |
| 8 | Andrew Krause | Heart Disease Dataset |
| 9 | Taiwo Ayeni | Cardiovascular Risk Factors in the Framingham Heart Study |
| 10 | Maya Creary | Is there a statistically significant difference in the “Inattentive” vs. “Hyperactive” symptom scores between male and female subjects? |
| 11 | Seoyoung Kim | CDC Disability data |
| 12 | Sifan You, Boshu Zhang | Relationship between car’s weight and its fuel efficiency |
| 13 | Mostafa Farhadian | Flood Insurance Claims Dataset |
| 14 | Michael Dixon | African American high school students’ Math scores |
| 15 | Kamana Joshi, Sushil Khadka | Spotify Audio Features |
| 16 | Seth Kwarteng | UCI Parkinsons dataset |
| 17 | Sierra Doherty | Braves ABS challenges |
| 18 | Daniel Geiger | African Mosquito Dataset |
📊 Data Requirement
You must use a public/open dataset.
Recommended sources:
- UCI Machine Learning Repository
- Kaggle (public datasets only)
- Data.gov
- CDC / WHO datasets
- Built-in SAS datasets (e.g.,
sashelp.cars,sashelp.heart, etc.)
Requirements:
- At least 100 observations
- At least 3–5 variables
- Must include:
- At least one categorical variable
- At least one numerical variable
Group:
You can form group of 2–3 students. If you prefer to work alone, that is also acceptable. PhD students MUST work individually.
Project Components
Your project must include the following components:
Research Question (10%)
Clearly state:
- What question are you trying to answer?
- Why is it interesting or important?
Examples:
- Does treatment A outperform treatment B?
- What factors affect income?
- Is there an association between two categorical variables?
Data Cleaning & Preparation (15%)
- Import dataset into SAS
- Handle:
- Missing values
- Data types
- Outliers (if necessary)
Suggested tools:
DATAstepPROC IMPORTPROC FORMAT
Exploratory Data Analysis (EDA) (15%)
Use SAS to explore the data:
- Summary statistics:
PROC MEANSPROC FREQ
- Visualizations:
PROC SGPLOT
Examples:
- Histograms
- Boxplots
- Scatterplots
Statistical Analysis (30%)
You must include at least TWO of the following methods:
- Chi-square test (
PROC FREQ) - Two-sample t-test (
PROC TTEST) - ANOVA (
PROC ANOVAorPROC GLM) - Regression (
PROC REG) - Correlation (
PROC CORR)
You should:
- State hypotheses
- Report test statistics and p-values
- Interpret results in context
SAS Macro (15%)
You must create at least one SAS macro.
Examples:
- A macro for:
- Summary statistics
- Running multiple regressions
- Automating hypothesis tests
Your macro must:
- Take at least 2 input arguments
- Be reusable
- Be clearly documented
Example structure:
%MACRO analysis(data, var);
PROC MEANS DATA=&data;
VAR &var;
RUN;
%MEND;Interpretation & Conclusion (15%)
You must:
- Explain results in plain language
- Connect findings to your research question
- Discuss:
- Limitations
- Possible improvements
📄 Deliverables
1. Written Report (PDF)
Length: 8–10 pages (excluding references and appendices). You may put additional Figures and Tables in the appendix.
Include:
- Introduction
- Data description
- Methods
- Results
- Conclusion
2. SAS Code File (.sas)
Must include:
- Clean and well-commented code
- Macro implementation
3. (Optional Bonus +5%)
- Visualization dashboard
- Additional modeling (e.g., interaction, model comparison)
- Advanced macro usage
📊 Grading Breakdown
| Component | Points |
|---|---|
| Research Question | 10 |
| Data Preparation | 15 |
| EDA | 15 |
| Statistical Analysis | 30 |
| SAS Macro | 15 |
| Interpretation | 15 |
| Total | 100 (20% course weight) |
- You must write your own SAS code
- Collaboration is allowed for discussion, but not for code sharing between the groups
- Plagiarism will result in zero credit
- Pick dataset early
- Perform EDA first
- Decide appropriate statistical methods
- Write SAS code
- Wrap repeated tasks into macros
- Write report last
This project integrates:
- Statistical thinking
- SAS programming
- Reproducible analysis
- Communication skills
This is designed to simulate a real-world data analysis task.