首页 > > 详细

辅导STAT270、辅导Applied Statistics、讲解R语言、R程序设计辅导 辅导Python编程|调试Web开发

Applied Statistics
Assignment Semester 1. Due: Thursday 23rd May 5:00pm 2019
You are expected to write your assignment using R Markdown (see Lecture 6) or MS Word and submit a PDF.
You are required to write your name, student ID and Unit Code on the first page. You need to submit your
assignment via the provided submission link on iLearn.
You may discuss the assignment in the early stages with your fellow students. However, the assignment
submitted should be your own individual work.
The R Markdown ‘Cheatsheet’ from the RStudio team is given here.
In your answers to the questions below, produce the appropriate R output and/or explanation of the steps and
results. Don’t include any more R output than necessary and include only concise explanations.
Question 1 [28 marks]
Since 1979, satellites have regularly measured the extent of sea ice in the Arctic Ocean. Rapid melting of
Arctic sea ice is seen both as a symptom and a cause of a changing climate. The average September Sea
Ice Extent (in 1, 000, 000km2
) are recorded for each year and the data is available in the file seaice.dat on
iLearn. The variables are defined below.
Extent Sea Ice Extent (in 1, 000, 000km2
)
Year Calendar Year
Using the data for the years 1979 to 2002 only, answer the following questions.
Hint: dat1 <- subset(dat, dat$Year<=2002) is one way to create a new dataset (dat1) in R that is a subset
of the original dataset (dat) and only contains data up to the year 2002.
a. [3 marks] State the statistical model for a simple linear regression of Extent explained by Year. Carefully
define all the necessary variables and parameters in your answer.
b. [3 marks] A simple linear regression seems appropriate for the 1979-2002 data. Justify the use of a
simple linear regression model.
c. [2 marks] Fit a simple linear regression to the 1979-2002 data. Explain why there is a linear relationship.
d. [2 marks] Is this a strong linear relationship? Explain your answer in the context of this data.
e. [2 marks] Predict the extent of the sea ice (in km2
) for the year 2000.
f. [2 marks] Compute a 95% prediction band for the Extent of Sea Ice (in km2
) in the year 2000.
g. [2 marks] Compute a 95% confidence band for the Extent of Sea Ice (in km2
) in the year 2000.
h. [2 marks] Explain clearly what the prediction band represents and what the confidence band represents.
1Using all the data for the years 1979 to 2012, answer the following questions.
i. [2 marks] Justify why a simple linear regression is inappropriate for the 1979-2012 data.
j. [3 marks] Fit a second order polynomial regression model to the data and validate the model.
k. [1 mark] Plot the fitted polynomial to your data
l. [2 marks] Using the second order model you fitted, predict the extent of the sea ice (in km2
) for the
year 2000.
m. [2 marks] Compare your answers in part e) and part l). Which prediction value do you recommend and
why?
2Question 2 [25 marks]
A study into the quality of Portuguese Vinho Verde red wine was conducted to examine the possible
relationship between wine quality and the chemical composition of the wine. Overall quality scores were
obtained by combining the scores from several tasters. Information was recorded on wine bottles of Vinho
Verde and is availble in the file pwine.dat on iLearn. The variables are defined below.
Quality Aggregated score across the taste testers
Alcohol Level of alcohol in percent
Density Density or specific gravity of the wine
pH Acidity level of the wine
a. [4 marks] State the statistical model for a multiple regression with Quality as the response using all
other variables as predictors, defining any parameters as necessary.
b. [2 marks] Fit this multiple regression model and write down the fitted model.
c. [4 marks] What are the assumptions required for a multiple regression analysis? If possible, validate
those assumptions for the multiple regression model you fitted in part b.
d. [6 marks] Conduct an F-test for the overall regression i.e. is there any relationship between the response
and the predictors. Write your answer as a formal hypothesis test and include the ANOVA table (one
combined regression SS source is sufficient.)
e. [3 marks] From the analysis in part b. determine the 95% CI for the Alcohol slope parameter and
comment on its meaning in this context.
f. [2 marks] Using the model selection procedures used in this course, find the best multiple regression
model that explains the data giving reasons for your choice(s).
g. [4 marks] State the final fitted regression model and comment on its interpretation.
3

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!