首页 > > 详细

ISTA 116辅导、data留学生讲解、辅导R程序设计、R语言讲解辅导Python程序|辅导Python编程

ISTA 116 Final Project
October 22, 2019
1 Overview
In your final project, you will apply the methods and techniques that you have learned in ISTA 116 to the
analysis of real-world data. The main goal of this project is to formulate a statistically answerable question
and address it through the analysis and/or collection of data, using summary statistics, data visualization,
and statistical inference.
1.1 Public Data Option
One option for completing this project is to analyze publicly available data. Data is available from several
public repositories, some focused toward statistics education and some with a broader scope. By using public
data, you will have a lot of flexibility in your choice of topic, but you should begin your data search early to
ensure that you are able to find data that is useful for answering your question.
1.2 Data Collection Option
If you prefer, you may collect data yourself using a survey or another method of data collection. If you
choose to survey people, you should make an effort to avoid convenience sampling or other potentially biased
methods of sampling.
The data collection option may require less time researching data, but will require you to begin earlier
to ensure you have enough time to collect the data you need.
1.3 Suggested data sources
If you choose to use publicly available data, a couple of sources are suggested below. This list may be
updated over time!
• The Data and Story Library (DASL) at dasl.datadescription.com. This has a collection of data
sets intended for statistics students to learn on
• data.gov offers a large amount of government data. Subjects include economic data, energy and climate
data, etc. There is a huge amount of data here, but it can be somewhat difficult to search through it
all and find a good data set.
These are not the only acceptable sources of data! Of course, there is a lot of data available. If you find
another source of data that is more directly applicable to your topic, you are free to use it. However, you
should make sure that the source is reputable, and that you have enough information about how the data
was collected to assess whether good data collection procedures were used.
2 Requirements
Your project submission will consist of two parts: a report and an R script. The report is a full description
of your project, including the question that you set out to answer, the source of data or data collection
methods you used, a visual and quantitative summary of the data you are investigating, and analysis using
inference methods such as confidence intervals, hypothesis tests, and regression models. Your report should
be 3-7 pages long including tables and graphs, and should have the following sections:
• An introduction in which you state the underlying research question and how you hope to answer it.
• A data/methods section in which you describe the source of your data, whether you are using a
publicly available data set or collecting your own. This should also include appropriate visualizations
of your data, in the form of tables and plots.
• An analysis section in which you describe the results of your statistical analysis, including: any associations
relevant to your question that you have observed; summary statistics describing the data that
you have collected/found, with confidence intervals where appropriate; the results of any hypothesis
tests you conducted; any linear or logistic regression models.
• A discussion section in which you state the conclusions that you have reached from your analysis.
Ideally, this will include a clear answer to the question that you posed in the introduction.
The R script accompanying your report should include the code that you used in your analysis. This
includes calculating summary statistics, creating plots, and performing statistical inference (confidence intervals,
hypothesis tests, and regression). Your script should be documented with comments briefly describing
what each piece of your code does.
3 Assessment
Your project will be evaluated based on the following qualities:
• Statistical question: Your project should attempt to answer a research question. This question
should be clearly stated, focused enough to be answered by the data available or the planned study,
and interesting.
• Data sourcing: Your project may use data that you collect by a survey or other type of study, or data
that is publicly available. Several suggested public data sources are available in another document.
If you choose to collect data with your own study, your report should address your sampling methods
and your study design, and note possible sources of bias in the data.
If you choose to use publicly available data, you should ensure that your data come from a reputable
source, and consider any limitations in the sampling or data collection process
• Display and Visualization: Your report should include appropriately chosen, well-labeled, and
accurate visualizations of your data, including tables, plots, and/or graphs.
• Analysis: Your report should include appropriately chosen summary statistics to describe the data
in your data set, and use inference methods such as confidence intervals, hypothesis tests, and linear
or logistic regression models for estimation of population parameters. Conditions should be checked
for all inference procedures, and your report should discuss the extent to which the results may be
generalized beyond the sample.
• Use of R: All of the calculations, summary statistics, plots, and inference described in your report
should be reproducible by the R script that you submit alongside it. This script should use functions
and techniques covered in class and in the lab assignments.
• Discussion, conclusion, and reflection: Your report should include a clear answer to the original
statistical question, consistent with the available data and the results of your visualization and analysis.
If a satisfactory answer to the question cannot be reached with the available data, you should discuss
what additional data might be needed.
If you collected your own data, include a discussion of what went well and what did not in your data
collection process. If you used publicly available data, you should discuss any weaknesses or limitations
of the data set that was available – is it missing important variables? was the data collected in a lessthan-optimal
manner?
Finally, your conclusion should propose some ideas for further study in the same area – this could be
a follow-up question informed by the results of your analysis, or a new study that could address some
of the limitations in the existing data.
4 Dates and Deadlines
In order to help your progress on the project, there are two intermediate deadlines. These intermediate
deadlines are each worth 10% of the total project points; if you miss an intermediate deadline, the point
value of your final submission will scale up to replace these. So, you should think of the intermediate
deadlines as a way to both “lock in” some of the points for the final project as well as to get some initial
feedback on your planned project.
4.1 Topic Proposal: October 31st
Your topic proposal should be a brief description (1-2 paragraphs) stating your research question and indicating
where you intend to acquire data. If you are choosing publicly available data, you should identify at least
one specific data set that you will use and its source; if you are planning to collect your own data, you should
have a description of how you will select your sample and make your observations. The proposal should
specify the variables of interest, which (if any) are explanatory and response variables, and any associations
you plan to investigate.
4.2 Summary of Methods: November 21st
In your summary of methods, you should submit a brief description of what methods you plan to use for
• Data visualization: which variables will you plot and how?
• Data summarization: which summary statistics (mean, median, etc.) will you calculate, for which
variables, and why?
• Inference methods: which types of confidence intervals, hypothesis tests, and regression models will
you use, and on which variables?
Your summary of methods does not have to include any of the results of this analysis, just a plan for what
you will do.
4.3 Rough Draft (Optional, December 5th)
We encourage you to submit a rough version of your project report ahead of the final deadline to get feedback.
This is not required and will not factor directly into your grade, but if you submit by the rough draft deadline
then we will be able to tell you any major aspects of your report that are missing or need revision.
4.4 Final Submission Deadline: December 17
The final project, including the report and R code, is due by 11:59 PM on Tuesday, December 17 (the date
assigned to ISTA 116 for final exams this semester).

 

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!