10 points
1. Assignment Description
Select a small data set from the available public data sets (you can find a list of public data sets here http://www.teymourian.de/public-data-sets-for-data-analytic-projects/ ).
Describe a research scenario and specify a research question based on data analytic methods that we learned in our class, for example methods like, one and two sample means, t-test, correlation tests, simple and multiple linear regression, ANOVA and ANCOVA, one and two-Sample Tests for Proportions and logistic regression.
Clean up your data and reduce it to no more than 500 observations if your data set is large.
2. Research Scenario Description (no more than 200 words)
Describe your research scenario in no more than 200 words. This is a general description of the use case. Similar to our class examples, we first describe the overall scenario and then we specify a specific research question based on it.
3. Describe the data set (no more than 200 words)
Describe briefly the data set. Describe each columns of the data set if you use the column in your analysis. Clean up your data before usage, for example you can remove the outliers. Remove unused columns. If possible provide a Link to the main data set source.
3. Research Question (no more than 100 words)
Describe briefly in one or two sentences the main research question. This is similar to the last sentence of our class examples.
4. Your solution R code
Copy your R code here. Start from read the data from a data file. Keep the following data read line.
This is similar to one of our R code examples.
data <- read.csv(“datafile.csv”, header=T)
5. Execute your R code, Copy and Paste results here in this Box.
Run your code and copy the output of your code to here.
6. State Your Conclusion (no more than 100 words)
State the conclusion so that a none-statistician can understand.
Solution Submission
1.Fill up this word file and upload it.
2.Upload your data set. This is the data set after cleaning (a small CSV file)
3.Upload your R file as a file with name “mini-project-solution.R”
Grading will be done based on
1.Originality of selected data set and data analysis approach
2.Data Preparation set and cleanup
3.General Correctness of data analysis
4.Quality of your R code and output results
5.Correct final conclusion