STAT 3312讲解、R语言辅导、讲解R编程设计、辅导SAS code讲解R语言程序|辅导R语言编程

STAT 3312 (Fall, 2019)
Final exam (take-home)
Name (ID):
Instructions
• This take-home exam is due 3:00PM, December 17, 2019.
• All of your answers and work must be your own.
• You are NOT allowed to discuss any part of this exam with anyone. If you have any questions,
ask me.
• For question #2, R or SAS code along with output must me submitted to support your
answer. It would be good if you underline results on the output relevant to your answer.
1. True/False questions (1.5 points each)
(1) The diagnosis of a mental illness (ex: schizophrenia, neurosis, depression) is an ordinal categorical
variable.
True ( ) False ( )
(2) If the odds of success equal 0.5 in a binary response, the the probability of success is 0.25.
True ( ) False ( )
(3) In a logistic regression model, logit[π(x)] = α + βx, e
α equals the odds of success when x = 1.
True ( ) False ( )
(4) In a logit model logit[π(x)] = α+βx, the probability increases at a rate of 0.16β when π(x) = 0.4.
True ( ) False ( )
(5) The Fisher’s exact test can be used to test if the odds ratio of a 2 × 2 table equals 1 when the
frequency counts are small.
True ( ) False ( )
(6) A classical linear regression model with errors having a normal distribution is a special case of
generalized linear model with the probit link.
True ( ) False ( )
1
(7) In testing for independence in two-way contingency tables, likelihood ratio tests and Pearson’s
χ
2
tests are equivalent for small sample sizes.
True ( ) False ( )
(8) In a generalized linear model, the link function is used to connect the values of the random
component and the systematic component.
True ( ) False ( )
(9) When x1 or x2 is the sole predictor for a binary response y, the likelihood ratio test of the effect
has P-value < 0.0001. When both x1 and x2 are in the model, it is possible that the likelihood
tests for H0 : β1 = 0 and for H0 : β2 = 0 could both have P-values larger than 0.05.
True ( ) False ( )
(10) For the logistic regression model with the identity link, the estimated probability of any value
for predictor x could exceed one.
True ( ) False ( )
2. The following table is based on an epidemiological survey of 3,000 subjects to investigate snoring
as a possible factor for heart disease. We use scores (0, 2, 3, 5, 6) for x = snoring level.
Heart Disease
Snoring Yes No
Never 24 1355
Sometimes 35 603
More often than not 21 215
Almost always 30 224
Every night 27 230
(a) Use R or SAS to fit the model with three link functions: the logit, probit, and complementary
Log-Log. Write down the estimated equations for all three models. (12 points)
2
(b) Find the estimated proportion for the logistic model when the snoring level is 2 and interpret
it in terms of the odds. (4 points)
(c) Use the fitted logistic model to calculate an approximate 97% confidence interval for the odds
ratio of a person in the “sometimes” category compared to a person in the “every night” category.
(5 points)
(d) Find the estimated proportion for the probit model when the snoring level is 3. (4 points)
(e) Find the estimated proportions for the complementary Log-Log model when the snoring levels
are “sometimes” and “almost always”, respectively. Which value is larger? (5 points)
3. Consider the following logistic regression model based on the horseshoe data with color and
width predictors:
logit[P(Y = 1)] = α + β1c1 + β2c2 + β3c3 + β4x,
where x denotes width and
c1 = 1 for color = medium light, 0 otherwise
c2 = 1 for color = medium, 0 otherwise
3
c3 = 1 for color = medium dark, 0 otherwise.
Fitting the model yields the following estimated equation:
logit[P(Yd= 1)] = −13.015 + 1.097c1 + 1.302c2 + 1.254c3 + 0.458x. (1)
Consider this fit for crabs of width x = 21cm.
(a) Estimate two probabilities for medium-light crabs and for dark crabs, and then calculate the
ratio of these two probabilities. (7 points)
(b) Estimate the odds ratio of a satellite for medium-light crabs and for dark crabs. Interpret it in
terms of the context. (7 points)
(c) Is there a big difference between the ratio of probabilities in (a) and the odds ratio in (c)? If
not, why does this happen? (5 points)
(d) Verify the value of the odds ratio in part (b) using the parameter estimates in Equation (1). (5
points)
4
4. In order to investigate effects of AZT in slowing the development of AIDS symptoms, a total of
343 veterans whose immune systems were beginning to falter after infection from the AIDS virus
were randomly assigned either to receive AZT immediately or to wait until their T cells showed
severe immune weakness. The following table is a 2 ×2×2 cross classification of the veteran’s race,
whether AZT was given immediately, and whether AIDS symptoms developed during the 3-year
study.
Symptoms
Race AZT use Yes (Fitted) No (Fitted) Row total
Black Yes 14 (A) 90 (B) 104
No 28 (C) 85 (D) 113
White Yes 10 (E) 55 (F) 65
No 14 (G) 47 (H) 61
Let X = AZT treatment (1 for AZT taken, 0 otherwise), Z = race (1 for blacks, 0 for whites), and
Y = whether AIDS symptoms developed (1 = yes, 0 = no). The ML fit turned out to be
logit(ˆπ) = −1.1427 − 0.6537x − 0.0037z. (2)
(a) Use Equation (2) to find the fitted values (A) - (H). (8 points)
5
(b) Perform a goodness of fit test by calculating the Pearson statistic X2 based on the observed
and fitted values in the table above. Does the model fit decently well? Justify your answer with
the P-value. (8 points)
6
5. Does job satisfaction depend on one’s income? The 1991 General Society Survey shows the
following results. Note that there are four levels in the job satisfaction categories (dissatisfied,
little, moderate, very) and four levels in the income categories (0-5K, 5K-15K, 15K-25K, >35K).
The income values are in dollars.
Income Job satisfaction
Dissatisfied Little Moderate Very
0-5K 2 4 13 3
5K-15K 2 6 22 4
15K-25K 0 1 15 8
>25K 0 3 13 8
Let Y = job satisfaction and let X = income scores (3K, 10K, 20K, 25K). Consider the baselinecategory
logit model with “very” as the baseline category:
log(πjπ4) = αj + βjx, j = 1, 2, 3.
The following table shows a part of the output regarding the estimated coefficients for a baselinecategory
logit model.
(Intercept):1 (Intercept):2 (Intercept):3
0.430 0.456 1.704
Income:1 Income:2 Income:3
−0.185 −0.054 −0.037
(a) Write down the three predicted equations, log(ˆπj/πˆ4) for j = 1, 2, 3. (6 points)
(b) Notice that βˆ
j < 0 for each logit. Interpret the implications in terms of the text. (4 points)
(c) What is the meaning of e
−0.185 = 0.83? Explain it rigourously in terms of the context. (4
points)
(d) Find the estimated probability of being “Moderate” category when his/her income is 20K. (4
points)