首页 >
> 详细

(STAT4038/STAT6038)

Assignment 2 for Semester 1, 2022

INSTRUCTIONS:

• This assignment is worth 15% of your overall marks for this course.

• You must complete this assignment by yourself. If you copy someone else’s work or

allow your work to be copied, you will receive a mark of zero for the assignment and

risk very severe academic consequences.

• Your report should be submitted to Turnitin on Wattle as a single pdf document (less

than 25MB) including the following:

1. The assignment cover sheet (available to download from Wattle).

2. Your assignment (no more than 10 pages).

3. An appendix including the R codes you used. Failure to upload the R code will

result in a penalty.

• Assignments should be typed. Your assignment may include some carefully edited R

output (e.g. graphs, tables) showing the results of your data analysis and a discussion

of these results, as well as some carefully selected code. Please be selective about

what you present and only include as many pages and as much R output as necessary

to justify your solution. Clearly label each part of your report with the part of the

question that it refers to.

• Unless otherwise advised, use a significance level of 5%. Round numeric answers to 4

decimal places (e.g., 0.0012).

• Marks may be deducted if these instructions are not strictly adhered to, and marks

will certainly be deducted if the total report is of an unreasonable length, i.e. more

than 10 pages including graphs and tables. You may include an appendix that is in

addition to the above page limits; however the appendix will not be assessed. It will

only be checked if there is some question about what you have actually done.

• Name your report “Course code-Uid”, e.g., “STAT4038-u1234567”.

• Try to submit your assignment at least 15 mins before the deadline in case something

unexpected happens, for instance internet issue.

• Late submissions will NOT be accepted. Extensions will usually be granted on medical or compassionate grounds on production of appropriate evidence, but must have

lecturer’s permission at least 24 hours before the deadline.

Assignment 2 - Sem 1, 2022 Page 1 of 3

Question 1 [100 Marks]

Moorhens are those blue-purple-red water birds often seen down near Lake Burley Griffin

in Commonwealth Park. They are characterised by large, fleshy red shields that protrude

from their heads. Some scientists have collected various measurements on a group of

43 moorhens in Commonwealth Park in the file “moorhen.csv”, which is available on

Wattle. The scientists have sent the data to you for analysis. This data contains

following 6 variables: Shield, Weight, Stern, Hb and TandT, Adult.

The e-mail accompanying the data is a little light on the details, but there is a suggestion

that moorhens form a fairly hierarchical society and that shield size is a relevant indicator

of a bird’s status within their group, so the variable of most interest (the response

variable) is the area of each bird’s shield (units not specified, but presumably in mm2).

An alternative explanation might be that a bird’s status is more strongly related to their

overall size (which could be measured by the bird’s weight, presumably in mg) and that

bigger birds simply have larger shields.

In this assignment, we would like to use all available variables including Weight to try

and build a multiple regression model with Shield area as the response variable. The

e-mail from the scientists that came with the data doesn’t really describe the variables

Stern, Hb and TandT, except to say that they are “three lineal measurements” taken on

each bird. Adult is an indicator of whether the bird is a juvenile (0) or adult (1) bird.

Use R to further analyse the “moorhen” data and answer these questions:

(a) [6 marks] Fit a multiple linear regression (MLR) model with Shield as the response variable and all other numeric variables (excluding Adult) as predictors.

Present the main residual plot of the residuals against the fitted values for this

model. Are there are any obvious problems with underlying assumptions?

(b) [10 marks] Now fit a MLR model with ln(Shield) as the response variable, still

using all the other numeric variables (not log transformed) as explanatory variables.

Again present the main residual plot of the residuals against the fitted values for

this new model. Does the transformation applied to the response variable appear

to have corrected any problems you identified in part (a)? Then, test whether this

model is significant.

(c) [12 marks] What are the estimated coefficients of the MLR model in part (b) and

the standard errors associated with these coefficients? Interpret the values of each

of the estimated coefficients with regards to model specification. Construct 95%

Bonferroni joint confidence intervals for all the slope parameters. Comment on the

t-test results in the summary output.

(d) [12 marks] Produce both a scatterplot matrix and a correlation matrix for the

predictors included in the model and comment on any important relationships

between the variables. Do you see a problem with this MLR model as in part

Assignment 2 - Sem 1, 2022 Page 2 of 3

(b)? Conduct a diagnostic check quantitatively to determine the severity of this

particular problem. What could be done to solve this problem?

(e) [12 marks] You have now discussed this problem with the scientists and they suggest to include Stern and Weight as potential predictors in the model. However,

you doubt the importance of the variable Weight. You are not sure what kind of

marginal relationship is between Weight and the response ln(Shield), given that

Stern is already included in the model. Generate an appropriate plot to visually

check this relationship and comment on the plot. Then conduct a partial F-test to

determine whether Weight is a significant addition to a model that already includes

Stern.

(f) [8 marks] The scientists remind you that a juvenile bird and an adult bird tend

to have a different shield size. Therefore, you want to know how does the variable

Adult affect the response ln(Shield). Conduct a test of whether an adult bird

has larger shield than a juvenile bird by fitting a simple linear regression model.

Then provide a 95% confidence interval on the slope coefficient and interpret this

interval.

(g) [6 marks] Finally, given above findings, you decide to fit a MLR model with ln(Shield)

as the response variable and with Adult and Stern as predictor. Conduct a t-test

for Stern in this model, compare this t-test result with the one in part (c) for

Stern, and comment on the reason of difference if any.

(h) [16 marks] Using the model in part (g), produce a plot of externally studentized

residuals against fitted values, a normal QQ plot, a leverage plot, a Cook’s distance

plot and a number of DFBETAs plots for all the slope coefficients in your model.

Comment on the model assumptions and unusual points. Do you see any feature

in the residual plot (explain if you see any)?

(i) [8 marks] Generate a scatter plot of Shield (in its original scale) against Stern,

using different color for juvenile and adult birds. Use the model from part (g) to

predict the expected shield area for both juvenile and adult birds over the full range

of possible Stern measurements and include these on your plot as two different

curves (using different color or line types). Include appropriate titles, axis labels,

a legend and a brief discussion of your plot.

(j) [10 marks] With the model in part (g), consider adding the interaction term between Stern and Adult. Generate a scatter plot of ln(Shield) (in log scale) against

Stern, using different color for juvenile and adult birds. Add fitted lines for juvenile and adult birds in a different color (or a different line type). Comment on

the plot whether there is a visible interaction. Then test whether the interaction

is significant.

Assignment 2 - Sem 1, 2022 Page 3 of 3

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-21:00
- 微信：codinghelp

- Comp 3711代写、辅导c/C++，Python编程 2022-12-01
- Ecn21004代写、辅导c/C++，Java编程 2022-12-01
- 代写mthm502、辅导r编程设计 2022-12-01
- 代做mthm003、辅导matlab程序设计 2022-12-01
- 1Econ7310辅导、辅导r编程语言 2022-11-14
- Cs 33400讲解、讲解java，Python编程 2022-11-14
- 辅导com6503、辅导java/Python程序 2022-11-14
- 辅导math6166、辅导r编程设计 2022-11-14
- 159.251编程辅导、Python/Java编程 2022-11-13
- 辅导dsc 40A、辅导python/C++程序 2022-11-13
- 讲解comp2113、C++语言程序辅导 2022-11-13
- Comp4161辅导、辅导c/C++设计编程 2022-11-13
- 辅导math 1021、辅导java/C++编程设计 2022-11-13
- 辅导program、辅导c++，Java编程 2022-11-12
- 辅导comp4161、辅导java，C++程序 2022-11-12
- Cosc 2637辅导、辅导c++/Java编程 2022-10-31
- Math5945讲解、C/C++,Java程序辅导 2022-10-31
- Program辅导、辅导python设计编程 2022-10-31
- 辅导data编程、辅导java/Python程序 2022-10-31
- Fit9136辅导、Python编程设计辅导 2022-10-30