首页 > > 详细

STAT4038/STAT6038 Assignment 2

 

(STAT4038/STAT6038)
Assignment 2 for Semester 1, 2022
INSTRUCTIONS:
• This assignment is worth 15% of your overall marks for this course.
• You must complete this assignment by yourself. If you copy someone else’s work or
allow your work to be copied, you will receive a mark of zero for the assignment and
risk very severe academic consequences.
• Your report should be submitted to Turnitin on Wattle as a single pdf document (less
than 25MB) including the following:
1. The assignment cover sheet (available to download from Wattle).
2. Your assignment (no more than 10 pages).
3. An appendix including the R codes you used. Failure to upload the R code will
result in a penalty.
• Assignments should be typed. Your assignment may include some carefully edited R
output (e.g. graphs, tables) showing the results of your data analysis and a discussion
of these results, as well as some carefully selected code. Please be selective about
what you present and only include as many pages and as much R output as necessary
to justify your solution. Clearly label each part of your report with the part of the
question that it refers to.
• Unless otherwise advised, use a significance level of 5%. Round numeric answers to 4
decimal places (e.g., 0.0012).
• Marks may be deducted if these instructions are not strictly adhered to, and marks
will certainly be deducted if the total report is of an unreasonable length, i.e. more
than 10 pages including graphs and tables. You may include an appendix that is in
addition to the above page limits; however the appendix will not be assessed. It will
only be checked if there is some question about what you have actually done.
• Name your report “Course code-Uid”, e.g., “STAT4038-u1234567”.
• Try to submit your assignment at least 15 mins before the deadline in case something
unexpected happens, for instance internet issue.
• Late submissions will NOT be accepted. Extensions will usually be granted on med￾ical or compassionate grounds on production of appropriate evidence, but must have
lecturer’s permission at least 24 hours before the deadline.
Assignment 2 - Sem 1, 2022 Page 1 of 3
Question 1 [100 Marks]
Moorhens are those blue-purple-red water birds often seen down near Lake Burley Griffin
in Commonwealth Park. They are characterised by large, fleshy red shields that protrude
from their heads. Some scientists have collected various measurements on a group of
43 moorhens in Commonwealth Park in the file “moorhen.csv”, which is available on
Wattle. The scientists have sent the data to you for analysis. This data contains
following 6 variables: Shield, Weight, Stern, Hb and TandT, Adult.
The e-mail accompanying the data is a little light on the details, but there is a suggestion
that moorhens form a fairly hierarchical society and that shield size is a relevant indicator
of a bird’s status within their group, so the variable of most interest (the response
variable) is the area of each bird’s shield (units not specified, but presumably in mm2).
An alternative explanation might be that a bird’s status is more strongly related to their
overall size (which could be measured by the bird’s weight, presumably in mg) and that
bigger birds simply have larger shields.
In this assignment, we would like to use all available variables including Weight to try
and build a multiple regression model with Shield area as the response variable. The
e-mail from the scientists that came with the data doesn’t really describe the variables
Stern, Hb and TandT, except to say that they are “three lineal measurements” taken on
each bird. Adult is an indicator of whether the bird is a juvenile (0) or adult (1) bird.
Use R to further analyse the “moorhen” data and answer these questions:
(a) [6 marks] Fit a multiple linear regression (MLR) model with Shield as the re￾sponse variable and all other numeric variables (excluding Adult) as predictors.
Present the main residual plot of the residuals against the fitted values for this
model. Are there are any obvious problems with underlying assumptions?
(b) [10 marks] Now fit a MLR model with ln(Shield) as the response variable, still
using all the other numeric variables (not log transformed) as explanatory variables.
Again present the main residual plot of the residuals against the fitted values for
this new model. Does the transformation applied to the response variable appear
to have corrected any problems you identified in part (a)? Then, test whether this
model is significant.
(c) [12 marks] What are the estimated coefficients of the MLR model in part (b) and
the standard errors associated with these coefficients? Interpret the values of each
of the estimated coefficients with regards to model specification. Construct 95%
Bonferroni joint confidence intervals for all the slope parameters. Comment on the
t-test results in the summary output.
(d) [12 marks] Produce both a scatterplot matrix and a correlation matrix for the
predictors included in the model and comment on any important relationships
between the variables. Do you see a problem with this MLR model as in part
Assignment 2 - Sem 1, 2022 Page 2 of 3
(b)? Conduct a diagnostic check quantitatively to determine the severity of this
particular problem. What could be done to solve this problem?
(e) [12 marks] You have now discussed this problem with the scientists and they sug￾gest to include Stern and Weight as potential predictors in the model. However,
you doubt the importance of the variable Weight. You are not sure what kind of
marginal relationship is between Weight and the response ln(Shield), given that
Stern is already included in the model. Generate an appropriate plot to visually
check this relationship and comment on the plot. Then conduct a partial F-test to
determine whether Weight is a significant addition to a model that already includes
Stern.
(f) [8 marks] The scientists remind you that a juvenile bird and an adult bird tend
to have a different shield size. Therefore, you want to know how does the variable
Adult affect the response ln(Shield). Conduct a test of whether an adult bird
has larger shield than a juvenile bird by fitting a simple linear regression model.
Then provide a 95% confidence interval on the slope coefficient and interpret this
interval.
(g) [6 marks] Finally, given above findings, you decide to fit a MLR model with ln(Shield)
as the response variable and with Adult and Stern as predictor. Conduct a t-test
for Stern in this model, compare this t-test result with the one in part (c) for
Stern, and comment on the reason of difference if any.
(h) [16 marks] Using the model in part (g), produce a plot of externally studentized
residuals against fitted values, a normal QQ plot, a leverage plot, a Cook’s distance
plot and a number of DFBETAs plots for all the slope coefficients in your model.
Comment on the model assumptions and unusual points. Do you see any feature
in the residual plot (explain if you see any)?
(i) [8 marks] Generate a scatter plot of Shield (in its original scale) against Stern,
using different color for juvenile and adult birds. Use the model from part (g) to
predict the expected shield area for both juvenile and adult birds over the full range
of possible Stern measurements and include these on your plot as two different
curves (using different color or line types). Include appropriate titles, axis labels,
a legend and a brief discussion of your plot.
(j) [10 marks] With the model in part (g), consider adding the interaction term be￾tween Stern and Adult. Generate a scatter plot of ln(Shield) (in log scale) against
Stern, using different color for juvenile and adult birds. Add fitted lines for ju￾venile and adult birds in a different color (or a different line type). Comment on
the plot whether there is a visible interaction. Then test whether the interaction
is significant.
Assignment 2 - Sem 1, 2022 Page 3 of 3
联系我们 - QQ: 99515681 微信:codinghelp
© 2021 www.7daixie.com
程序辅导网!