Stat 4914 Project 1
Project 1: Explaining an Outcome with Regression
The main focus of this project will be regression analysis with hypothesis testing, with an emphasis on verifying assumptions, creating effective visualizations, and reporting in an accessible, professional format.
Choose a dependent outcome variable over a season - such as wins, yards, or points scored - that you believe can be explained by independent variables - such as payroll, height, division, or position. Use regression to quantify and explain the relationship between the independent variables and dependent variable of interest. Your regression can be multiple linear regression or a generalized linear model such as logistic or Poisson regression.
Limit your report to 15 pages, including reproducible code. Hide unnecessary code output (library loading, ggplot code, etc.) but include relevant exploratory and modeling code (collinearity screening, model build- ing, assumption checking, etc.). Caption figures and tables, cite all sources, and proofread your submission.
a) Data Selection
Choose a data set that meets the following characteristics:
• At least 50 observations
• At least one categorical predictor with three or more levels
• At least two continuous predictors
• A continuous, binary, or count dependent variable
• Sports-related; this has a wide interpretation and can include professional sports leagues (NFL, UEFA, etc.), games and pastimes (chess, go, etc.), data sets related to physical activity (motion sensors, kinesiology, etc.), or sociological (stadium design, viewership, marketing, fan perceptions, etc.).
• Free-use from a reliable, rigorous source
• Lots of options can be found at https://sportsandsociety.osu.edu/sports-data-sets, https: //vincentarelbundock.github.io/Rdatasets/articles/data.html, https://archive.ics.uci.edu/datasets, and https://www.kaggle.com/search.
Provide an overview of your data set, including a description of context and the relevant sports-related information necessary to understand your analysis. Assume your audience is unfamiliar with the main topic of your report.
b) Exploratory Analysis
Conduct an exploratory analysis of your data, including preliminary variable screening and exploratory plots. Include at least three exploratory plots, one of which has at least three variables represented. Plots should be professional, presentation-quality.
c) Model Building
Build at least two distinct regression models to explain the relationships between your dependent variable and independent variables.
d) Model Evaluation
Assess how well your models meets assumptions. Compare your models and choose one that explains your data the best.
e) Analysis and Findings
Provide a professional write-up of your findings, describing both the statistical and practical findings of your analysis.