Data Analysis Using SAS

Weight: 20% of final grade
Last modified: April 6, 2026
Due Date: Apr 27, 2026

๐ŸŽฏ Project Overview

The goal of the final project is to apply statistical methods and SAS programming skills learned throughout the course to conduct a complete data analysis pipeline.

Students will:

  • Work with a real-world open dataset
  • Perform data cleaning, visualization, and modeling
  • Use SAS procedures and macros
  • Communicate results clearly through a written report

Group Information

Group Member Topic
1 Zong Yang Financial Metric Transmission Mechanism within Coca-Colaโ€™s Income Statement
2 Mangsa Limbu Pheudin, Shaswat Bhushan Jangam, Allswell Akomanyi-Addo
3 Wenpu Ma
4 Vu, Ha
5 Umma Hafsah Himu
6 Kalen Jinnah, Ivanna Poliashenko, Sierra Doherty
7 Zhe Zhong Bank Marketing Dataset
8 Andrew Krause Heart Disease Dataset
9 Taiwo Ayeni Cardiovascular Risk Factors in the Framingham Heart Study
10 Maya Creary Is there a statistically significant difference in the โ€œInattentiveโ€ vs. โ€œHyperactiveโ€ symptom scores between male and female subjects?
11 Seoyoung Kim CDC Disability data
12 Sifan You, Boshu Zhang Relationship between carโ€™s weight and its fuel efficiency

๐Ÿ“Š Data Requirement

You must use a public/open dataset.

Requirements:

  • At least 100 observations
  • At least 3โ€“5 variables
  • Must include:
    • At least one categorical variable
    • At least one numerical variable

Group:

You can form group of 2โ€“3 students. If you prefer to work alone, that is also acceptable. PhD students MUST work individually.

Project Components

Your project must include the following components:

Research Question (10%)

Clearly state:

  • What question are you trying to answer?
  • Why is it interesting or important?

Examples:

  • Does treatment A outperform treatment B?
  • What factors affect income?
  • Is there an association between two categorical variables?

Data Cleaning & Preparation (15%)

  • Import dataset into SAS
  • Handle:
    • Missing values
    • Data types
    • Outliers (if necessary)

Suggested tools:

  • DATA step
  • PROC IMPORT
  • PROC FORMAT

Exploratory Data Analysis (EDA) (15%)

Use SAS to explore the data:

  • Summary statistics:
    • PROC MEANS
    • PROC FREQ
  • Visualizations:
    • PROC SGPLOT

Examples:

  • Histograms
  • Boxplots
  • Scatterplots

Statistical Analysis (30%)

You must include at least TWO of the following methods:

  • Chi-square test (PROC FREQ)
  • Two-sample t-test (PROC TTEST)
  • ANOVA (PROC ANOVA or PROC GLM)
  • Regression (PROC REG)
  • Correlation (PROC CORR)

You should:

  • State hypotheses
  • Report test statistics and p-values
  • Interpret results in context

SAS Macro (15%)

You must create at least one SAS macro.

Examples:

  • A macro for:
    • Summary statistics
    • Running multiple regressions
    • Automating hypothesis tests

Your macro must:

  • Take at least 2 input arguments
  • Be reusable
  • Be clearly documented

Example structure:

%MACRO analysis(data, var);
    PROC MEANS DATA=&data;
        VAR &var;
    RUN;
%MEND;

Interpretation & Conclusion (15%)

You must:

  • Explain results in plain language
  • Connect findings to your research question
  • Discuss:
  • Limitations
  • Possible improvements

๐Ÿ“„ Deliverables

1. Written Report (PDF)

Length: 8โ€“10 pages (excluding references and appendices). You may put additional Figures and Tables in the appendix.

Include:

  • Introduction
  • Data description
  • Methods
  • Results
  • Conclusion

2. SAS Code File (.sas)

Must include:

  • Clean and well-commented code
  • Macro implementation

3. (Optional Bonus +5%)

  • Visualization dashboard
  • Additional modeling (e.g., interaction, model comparison)
  • Advanced macro usage

๐Ÿ“Š Grading Breakdown

Component Points
Research Question 10
Data Preparation 15
EDA 15
Statistical Analysis 30
SAS Macro 15
Interpretation 15
Total 100 (20% course weight)
WarningImportant Notes
  • You must write your own SAS code
  • Collaboration is allowed for discussion, but not for code sharing between the groups
  • Plagiarism will result in zero credit
Note๐Ÿ’ก Suggested Workflow
  1. Pick dataset early
  2. Perform EDA first
  3. Decide appropriate statistical methods
  4. Write SAS code
  5. Wrap repeated tasks into macros
  6. Write report last
NoteKey Learning Outcome

This project integrates:

  • Statistical thinking
  • SAS programming
  • Reproducible analysis
  • Communication skills

This is designed to simulate a real-world data analysis task.