首页 >
> 详细

ISTA 116 Final Project

October 22, 2019

1 Overview

In your final project, you will apply the methods and techniques that you have learned in ISTA 116 to the

analysis of real-world data. The main goal of this project is to formulate a statistically answerable question

and address it through the analysis and/or collection of data, using summary statistics, data visualization,

and statistical inference.

1.1 Public Data Option

One option for completing this project is to analyze publicly available data. Data is available from several

public repositories, some focused toward statistics education and some with a broader scope. By using public

data, you will have a lot of flexibility in your choice of topic, but you should begin your data search early to

ensure that you are able to find data that is useful for answering your question.

1.2 Data Collection Option

If you prefer, you may collect data yourself using a survey or another method of data collection. If you

choose to survey people, you should make an effort to avoid convenience sampling or other potentially biased

methods of sampling.

The data collection option may require less time researching data, but will require you to begin earlier

to ensure you have enough time to collect the data you need.

1.3 Suggested data sources

If you choose to use publicly available data, a couple of sources are suggested below. This list may be

updated over time!

• The Data and Story Library (DASL) at dasl.datadescription.com. This has a collection of data

sets intended for statistics students to learn on

• data.gov offers a large amount of government data. Subjects include economic data, energy and climate

data, etc. There is a huge amount of data here, but it can be somewhat difficult to search through it

all and find a good data set.

These are not the only acceptable sources of data! Of course, there is a lot of data available. If you find

another source of data that is more directly applicable to your topic, you are free to use it. However, you

should make sure that the source is reputable, and that you have enough information about how the data

was collected to assess whether good data collection procedures were used.

2 Requirements

Your project submission will consist of two parts: a report and an R script. The report is a full description

of your project, including the question that you set out to answer, the source of data or data collection

methods you used, a visual and quantitative summary of the data you are investigating, and analysis using

inference methods such as confidence intervals, hypothesis tests, and regression models. Your report should

be 3-7 pages long including tables and graphs, and should have the following sections:

• An introduction in which you state the underlying research question and how you hope to answer it.

• A data/methods section in which you describe the source of your data, whether you are using a

publicly available data set or collecting your own. This should also include appropriate visualizations

of your data, in the form of tables and plots.

• An analysis section in which you describe the results of your statistical analysis, including: any associations

relevant to your question that you have observed; summary statistics describing the data that

you have collected/found, with confidence intervals where appropriate; the results of any hypothesis

tests you conducted; any linear or logistic regression models.

• A discussion section in which you state the conclusions that you have reached from your analysis.

Ideally, this will include a clear answer to the question that you posed in the introduction.

The R script accompanying your report should include the code that you used in your analysis. This

includes calculating summary statistics, creating plots, and performing statistical inference (confidence intervals,

hypothesis tests, and regression). Your script should be documented with comments briefly describing

what each piece of your code does.

3 Assessment

Your project will be evaluated based on the following qualities:

• Statistical question: Your project should attempt to answer a research question. This question

should be clearly stated, focused enough to be answered by the data available or the planned study,

and interesting.

• Data sourcing: Your project may use data that you collect by a survey or other type of study, or data

that is publicly available. Several suggested public data sources are available in another document.

If you choose to collect data with your own study, your report should address your sampling methods

and your study design, and note possible sources of bias in the data.

If you choose to use publicly available data, you should ensure that your data come from a reputable

source, and consider any limitations in the sampling or data collection process

• Display and Visualization: Your report should include appropriately chosen, well-labeled, and

accurate visualizations of your data, including tables, plots, and/or graphs.

• Analysis: Your report should include appropriately chosen summary statistics to describe the data

in your data set, and use inference methods such as confidence intervals, hypothesis tests, and linear

or logistic regression models for estimation of population parameters. Conditions should be checked

for all inference procedures, and your report should discuss the extent to which the results may be

generalized beyond the sample.

• Use of R: All of the calculations, summary statistics, plots, and inference described in your report

should be reproducible by the R script that you submit alongside it. This script should use functions

and techniques covered in class and in the lab assignments.

• Discussion, conclusion, and reflection: Your report should include a clear answer to the original

statistical question, consistent with the available data and the results of your visualization and analysis.

If a satisfactory answer to the question cannot be reached with the available data, you should discuss

what additional data might be needed.

If you collected your own data, include a discussion of what went well and what did not in your data

collection process. If you used publicly available data, you should discuss any weaknesses or limitations

of the data set that was available – is it missing important variables? was the data collected in a lessthan-optimal

manner?

Finally, your conclusion should propose some ideas for further study in the same area – this could be

a follow-up question informed by the results of your analysis, or a new study that could address some

of the limitations in the existing data.

4 Dates and Deadlines

In order to help your progress on the project, there are two intermediate deadlines. These intermediate

deadlines are each worth 10% of the total project points; if you miss an intermediate deadline, the point

value of your final submission will scale up to replace these. So, you should think of the intermediate

deadlines as a way to both “lock in” some of the points for the final project as well as to get some initial

feedback on your planned project.

4.1 Topic Proposal: October 31st

Your topic proposal should be a brief description (1-2 paragraphs) stating your research question and indicating

where you intend to acquire data. If you are choosing publicly available data, you should identify at least

one specific data set that you will use and its source; if you are planning to collect your own data, you should

have a description of how you will select your sample and make your observations. The proposal should

specify the variables of interest, which (if any) are explanatory and response variables, and any associations

you plan to investigate.

4.2 Summary of Methods: November 21st

In your summary of methods, you should submit a brief description of what methods you plan to use for

• Data visualization: which variables will you plot and how?

• Data summarization: which summary statistics (mean, median, etc.) will you calculate, for which

variables, and why?

• Inference methods: which types of confidence intervals, hypothesis tests, and regression models will

you use, and on which variables?

Your summary of methods does not have to include any of the results of this analysis, just a plan for what

you will do.

4.3 Rough Draft (Optional, December 5th)

We encourage you to submit a rough version of your project report ahead of the final deadline to get feedback.

This is not required and will not factor directly into your grade, but if you submit by the rough draft deadline

then we will be able to tell you any major aspects of your report that are missing or need revision.

4.4 Final Submission Deadline: December 17

The final project, including the report and R code, is due by 11:59 PM on Tuesday, December 17 (the date

assigned to ISTA 116 for final exams this semester).

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- System课程作业代做、Programming作业代写、C++编程设计作业 2020-08-12
- 代写programming作业、代做python程序语言作业、代写data课 2020-08-12
- 代写software课程作业、代做r程序设计作业、代写r语言作业、代写dat 2020-08-12
- Eee2007作业代写、Programming作业代写、代做c/C++编程设 2020-08-12
- 代做spss|代写python编程|代写python程序|代写留学生 Sta 2020-08-12
- 代写sec202 代写留学生asp编程、Prolog帮写 2020-08-12
- Cs412 代写mean代写留学生jsp课程设计 2020-08-12
- 代写itnpbd7调试java作业、Java编程代写 2020-08-12
- 代写pmath 340-Assignment 2代写asp编程作业、C/C+ 2020-08-12
- Tsp课程作业代写、代做algorithms留学生作业、代做java，C/C 2020-06-23
- Kit107留学生作业代做、C++编程语言作业调试、Data课程作业代写、代 2020-06-23
- Sta302h1f作业代做、代写r课程设计作业、代写r编程语言作业、代做da 2020-06-22
- 代写seng 474作业、代做data Mining作业、Python，Ja 2020-06-22
- Cmpsci 187 Binary Search Trees 2020-06-21
- Comp226 Assignment 2: Strategy 2020-06-21
- Math 504 Homework 12 2020-06-21
- Math4007 Assessed Coursework 2 2020-06-21
- Optimization In Machine Learning Assig... 2020-06-21
- Homework 1 – Math 104B 2020-06-20
- Comp1000 Unix And C Programming 2020-06-20