首页 > > 详细

3AS/3AS4辅导、辅导R程序语音

3AS/3AS4: Applied Statistics
Assignment 1 Date: October 5, 2022
To be submitted by 5pm, November 03, 2022
1. In the data set igfdata.csv, measurements on age, sex and insulin-like growth
factor (igf) for a group of people are available. The data set can be downloaded
from canvas. The original source is: J. Clin. Endocrinol. Metab. 78(3): 744–752,
March 1994. Each row in the data set corresponds to one individual. You need to
download the file in your computer in a suitable folder of your choice. Then start
RStudio and set that folder containing the data as your working directory from the
“Session” menu.Finally, import the data in R using the following:
igfdata = read.csv("igfdata.csv", header=T)
For all the following questions, include your R codes, plots, and outputs in the
solution.
(a) Make a suitable plot for the distribution of igf and discuss your findings.
(b) Compare the igf for males and females using boxplots. Discuss your findings.
(c) Make a scatterplot of igf against age. Comment on how igf changes with
age.
(d) Fit a simple linear regression to predict igf using age. Is age a significant
variable in this regression? Justify your answer.
(e) Report the mean and standard deviation of igf for males and females sepa-
rately.
(f) Using linear regression or otherwise check if there is a significant difference in
mean igf for males and females. Use level of significance α = 0.05.
(g) Consider the subset of the data with age less than or equal to 15 years. For
this subset of people, use linear regression with age and sex as predictors to
predict igf. Comment on the significance of the variables age and sex.
(h) Use residual plot to check if the nonlinearity assumption is violated and if so,
use an alternative model to fit the data.
(i) Fit a similar model to predict igf for the subset of people with age greater
than 15 years.
(j) Is there a significant difference in the models for people with age less than or
equal to 15 years and for people with age greater than 15 years.
2. I collect a set of data (n = 100 observations) containing a single predictor and a
quantitative response. I then fit a linear regression model to the data, as well as a
separate cubic regression, i.e. Y = β0 + β1X + β2X
2 + β3X
3 + .
(a) Suppose that the true relationship between X and Y is linear, i.e. Y = β0 +
β1X + . Consider the training residual sum of squares (RSS) for the linear
regression, and also the training RSS for the cubic regression. Would we expect
one to be lower than the other, would we expect them to be the same, or is
there not enough information to tell? Justify your answer.
1
(b) Answer (a) using test rather than training RSS.
(c) Suppose that the true relationship between X and Y is not linear, but we
don’t know how far it is from linear. Consider the training RSS for the linear
regression, and also the training RSS for the cubic regression. Would we expect
one to be lower than the other, would we expect them to be the same, or is
there not enough information to tell? Justify your answer.
(d) Answer (c) using test rather than training RSS.
3. Consider a simple linear regression model yi = β0 + β1xi + i for i = 1, . . . , n.
Assume E(i) = 0, V ar(i) = σ
2 and E(ij) = 0 for i 6= j. Let β?0 and β?1 be the
least squares estimator of β0 and β1 and y?0 be the predicted value of y for a new
observation x = x0.
(a) Show that y?0 is a linear estimator, that is, y?0 =
∑n
i=1 ciyi for some constants
ci depending on x1, . . . , xn.
(b) Derive the bias and the variance of y?0.
(c) Compare the bias and the variance of y?0 if the true model is y = β0 + β1x +
 

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!