辅导STAT 511辅导R、R设计辅导留学生

STAT 511 Exam2 – Spring 2020

Instructions (Please take a moment to read):
1. Students are expected to work independently on the exam. Do NOT discuss the exam with anyone
else. Do NOT post questions or comments about the exam to Canvas. Do NOT share R code or
notes or email regarding the Final Exam. Please consider fairness to your classmates as a guide of
conduct when working on the exam.
2. Consider using the R Markdown Template in Canvas, but NOT required. But, please be organized
and concise which will be 6 points of Exam (no need to spend too much time on the “perfect
document”, most submissions will receive 6/6). Also, the template may be useful guide for
organization even if copy/pasting to Word document or knitting to Word.
3. When including figures and tables please make them clear and concise. No need to go overboard on
detail, but correct formatting and essential labeling should be included.
4. Please make an effort to provide clear, concise, and coherent grammar for written response.
5. For any “hand” calculation questions, show your work in order to be eligible for partial credit. As a
general rule, round answers to 4 decimal places.
6. Use α = 0.05 for all questions (unless specified otherwise).
7. You may use any software, reference, or on-line resource that you find helpful.
8. If you have a specific question regarding content of the exam such as: interpretation of a question on
the exam, requirements for a response to a question, software issues with Rstudio, or R function that
continues to give errors, please send an email directly to me (). I will try to
respond in a reasonable time frame. Also, please make sure your Canvas settings allow
notifications when there are announcements on Canvas in case I need to clarify something on the
Final. But, I will likely not be responding to any email after the due date (for a while anyway).
9. The Exam must be submitted to Canvas in pdf format by 11:59 pm Wednesday 4/14/2020 using the

Please include your name on your submitted document as “signature” for Honor Pledge below.
Honor Pledge: I have not given, received, or used any unauthorized assistance on this exam.

Exam Parts:
Multiple Choice (32 pts)
True/False are 2 points, remaining 3 pts each as before.
Matching Question (14 pts)
Chapter 6 Problem: Sleep Data (12 pts)
Chapter 10 problem: Lefties (12 pts)
Chapter 8/9 Problem: Cuckoo Bird Eggs (24 pts)
Organization /Clarity (6pts)

2
1. Multiple Choice ( 32 pts)
For each numbered problem for this section, note the best answer choice in R Markdown or
Submitted document for each numbered question for Multiple Choice.

Questions 1 through 7 (True or False): For each question, just note in your submitted document the
question number and True or False. No need to justify. (Each True/False question is 2 pts,
remaining multiple choice are 3 pts).

1. Managing experiment-wise error rate is especially important when comparing means associated with
a very large number of treatment levels.
2. The LSD (unadjusted) pairwise comparison method helps control the experiment-wise error rate.
3. The HSD (Tukey) pairwise comparison method has lower power than the LSD (unadjusted) method.
4. For many cases, multiple comparison tests using Bonferroni’s adjustment can be considered too
conservative.
5. Dunnet’s method can be used to test all pairwise comparisons from a one-way ANOVA.
6. For a one-way ANOVA, subsequent multiple comparison adjustment methods are only viable when
the response variable is normally distributed across all treatment groups.
7. A survey yielded an estimated proportion of 0.11 based on a sample of size n=55. The large sample
normal approximation is adequate for this scenario. Use the criteria based on 3xSE.

8. In R, the function lm() performs which of the following?
(A) An ANOVA of specified data
(B) A linear model of specified response and predictor variables.
(C) A list of means for a specified response variable
(D) A likelihood maximized estimate for a specified response variable.

9. The multiple testing problem is best described by which of the following:
(A) Testing a hypothesized mean before a treatment, and then testing the mean again after a treatment.
(B) Having a large number of potential Type II errors when comparing many pairs of means between
treatment groups.
(C) Having a large number of potential Type I errors when comparing many pairs of means between
treatment groups.
(D) When performing an ANOVA, the degrees of freedom for the residuals (within) is considered to be
too large.

10. As a variable, the number of CSU graduate students who voted in the 2020 primary election is
best described by which of the following.
(A) Qualitative and Discrete
(B) Quantitative and Discrete
(C) Qualitative and Continuous
(D) Quantitative and Continuous
3

Suppose you collect data from four different populations and have the following summary statistics.
Use the table below to answer questions 11 - 12.

N Mean SD SE
Group A 45 76.54 19.45 2.90
Group B 44 78.45 32.01 4.83
Group C 43 79.65 57.21 8.72
Group D 42 81.32 84.43 13.03

11. If you performed an ANOVA using the data that generated the summary statistics above, which of
the following outcomes would you expect?
(A) A small F statistic and a small p-value
(B) A small F statistic and a large p-value
(C) A large F statistic and a small p-value
(D) A large F statistic and a large p-value

12. If you performed diagnostics for a fitted model in order to do an ANOVA using the data that
generated the summary statistics above, which of the following would you expect?
(A) There would be no need to perform diagnostics since the ANOVA assumptions are violated with
unequal sample sizes.
(B) The p-value from a Levene’s test is likely to be relatively small
(C) There is certainly going to be problems with the data when plotted on a qqplot
(D) The means are likely to be significantly different

13. A pharmaceutical company's allergy medication is known to provide relief to 75% of the people
who use it. The company wants to see if a new, improved version of the medication works even
better. In a test of the hypotheses H0: = .75 versus HA: > .75, the p-value is .32. Which of the
following gives the best interpretation of this p-value?
(A) There is a 32% chance that the new medication is more effective than the old medication.
(B) There is a 32% chance that the new medication and old medication are equally effective.
(C) If the new medication is more effective than the old medication (if HA is true), there is a 32%
chance of obtaining the observed sample proportion or something greater due to natural sampling
variation.
(D) If the new medication and old medication are equally effective (if H0 is true), there is a 32% chance
of obtaining the observed sample proportion or something greater due to natural sampling
variation.

4

Matching Problem (14 pts)
Below are statistical tests covered in Chapter 6 up to Chapter 10. Match each named test to the most
appropriate scenario below. Assume all data are collected through random sampling methods. Please
list the scenarios in your submitted work and match the corresponding letter. Each letter used once.
No explanation is necessary.
A. Levene’s Test
B. Welch-Satterthwaite Test
C. #- Test
D. ANOVA
E. Paired T-test
F. Kruskal-Wallis Test
G. Tukey’s Method
Scenario 1: A physical assessment called VO2 -max measures fitness levels by determining the
volume of oxygen a person can use in respiration during physical activity. A researcher wants to
see if VO2-max on average is different for those who live at high altitude versus those who live at
lower elevations. A random sample of 45 active people between the ages of 30 and 40 are selected
from several residents in the mountains of Colorado (above 9,000 ft). A random sample of 53
similarly active people are selected who live on coastal areas of California. On inspection of the
boxplot, it appears that the variances of VO2-max from each group are quite different.
Scenario 2: Using the same data in Scenario 1, the researcher would like to use a test and p-value
that strengthen evidence that the variance between the two samples are really different.
Scenario 3: A researcher wishes to compare means for 6 groups for which the standard deviations
within each group appear very similar. She would like all pairwise comparisons to be based on
honestly significant differences. She finds that the fitted model of response versus group levels has
residuals which are distributed approximately Normal (0, %).
Scenario 4: A researcher would like to determine if 3 treatments of sample size 9 have the same
central value. The fitted model of response versus group levels has residuals that do not appear to
be normally distributed. But, it does appear that the variances for each group are very similar.
Scenario 5: A researcher would like to determine if caffeine helps sprinters run faster times.
Twelve runners are selected to run one lap as fast as they can, and their time is recorded. Each
runner then drinks 3 double espressos. Thirty minutes after drinking the coffee, each runner then
runs one lap once again as fast as they can, and their time is recorded. Assume the differences
between lap time for each runner before and after the being caffeinated is normally distributed.
Scenario 6: A researcher would like to determine if 3 treatments of sample size 13 have the same
central value. The fitted model of response versus group levels has residuals that appear to be
normally distributed and the variance for each group appear to be very similar.
Scenario 7: For quality control, a machine must manufacture a drug within a range for a certain
amount of active ingredient. A random sample of 50 tablets are measured to see if the standard
deviation of the amount of the active ingredient is below a certain value.

5
There is a R Markdown Template Available for these questions.

Sleep Data in R (12 pts , 3 pts/part)
1. There are many “built-in” datasets available in Base R. You may have found some that are used
when searching for help on the internet (the iris dataset for example is popular with graphing).
Other packages that you install into R likely have their own example data as well.

For this question we use the sleep data. It is important to first read the help for any dataset you
use! This help is rather limited, but does offer a bit of background as well as references.
To analyze, there is no data to load. For example, view the structure of these data with just:
str(sleep). After checking out this very old data set about a sleep medication, submit answers to the
following questions while including appropriate R code and output. But please make sure the
answer is clear in some narrative form as well (not just a list of output).

(a) Define the parameter(s) of interest. Use appropriate symbol(s) (or at least names of Greek
letters, no mark-up required) to write the hypotheses that was used in this study. (Read the
help carefully)

(b) Provide a boxplot by group. Also, briefly explain (1 or 2 sentences) why this boxplot on its
own might be misleading to a reader (This question is meant to help think about the hypotheses
above to be sure. Sometimes these types of studies will use boxplots still, but clarify what it
shows).

(c) Provide output for the appropriate test. Also, please note the value of the test statistic and
the p-value for the test you defined in part (a).

(d) State the conclusion of the test in terms of the context of the study.

Left-Handed, no data to load here either (12, 3 pts/part)
2. Before the 1980s, school children were encouraged (and sometimes even forced) to write with their
right hand as opposed to their left. As a result, only about 8% of Americans in the 1980s claimed to
be left-handed. Over time, the stigma associated with being left-handed and social pressure against
it have relaxed. To investigate whether the proportion of the population that is left-handed has
increased since the 1980s, a psychologist surveys a random sample of 150 Americans of whom 18
claim to be left-handed. At the 5% significance level, is this evidence that the proportion of
Americans that are left-handed is higher today than in the 1980s?

(a) Define the parameter of interest and state the null and alternative hypotheses.

(b) Find an appropropriate test statistic.

(c) Determine an appropriate p-value.

(d) Write a conclusion for this hypothesis test in the context of the study.

6
Egg sizes of Cuckoo birds.
3. The European common cuckoo bird is known for laying eggs in nests of other bird species (in
terms of data, these are HostSpecies, though not all are actual bird species descriptions). Nest
categories include meadowlarks, trees, hedges, robins, wagtails, and wrens. Researchers measured
the size of the cuckoo eggs (in mm) relative to the different types of nests in which cuckoos lays
eggs. They hope to see if there are possible differences in size according to hostspecies. Suppose
that measurements taken in this study represent random and independent samples. (total pts = 24)

This dataset is provided in the Exam2 page. Note that these data need to be put in long form. The
sample sizes are not equal, so you’ll need to deal with missing data when loading. Help for this is
provided in the template.

(a) Provide an appropriate summary plot.

(b) Provide approporiate summary statistics

(c) No need to include them, but check diagnostics for assumptions of this type of a analysis. In
a sentence or three note any concerns that might be troubling and support your answers.

(d) Regradless of model assumption concerns, provide an ANOVA table

(e) State a conclusion to the F-test in context of the study.

(f) Provide some type of compact display comparing the different host species and in a sentence
or two summarize the findings.

(g) Suppose that cuckoo clutch size (number of eggs laid) in nests of wrens and meadowlarks are
similar, and cuckoo clutch size in nests of robins and wagtails are similar. Researchers are
interested in seeing if the size of the host bird is also related to the average size of cuckoo eggs.
They wish to compare egg size differences between (wren vs meadowlarks) vs (robins vs
wagtails), which can also be thought of as an interaction contrast. In other words, wrens are
smaller than meadowlarks, but robins and wagtails are very similar in size. A significant
difference in these comparisons would suggest bird size may be related to cuckoo egg size while
controlling for clutch size. (6 pts, 3 each for following)

(i) Consider the appropriate contrast coefficients to write the appropriate null hypothesis using
the following parameters ( '()* , -)./0' , (012* , '.34.25).

(ii) Provide an estimate for this contrast. And provide a p-value for this contrast.