讲解ANLY-511、辅导R、R编程设计调试、讲解data留学生解析C/C++编程|解析R语言编程

ANLY-511 Homework 7 Problems
Submit problems 73,75,78,80,84 89 and 90. Explain your work, give concise
reasoning, and . Attach R code with comments if applicable. Using Markdown
is the best way to do this. Do not print out any data or any detailed results of
simulations.
73. (2 points) The built-in data set ”chickwts” is a data frame with weight
gains in grams for 71 chicken who were each given one of six different treatments
(feed types). It may be loaded with the R command data(chickwts).
Load the dataset and make a single graph with side by side box plots of weight gains
for the six feed types. Which feed type appears to result in the largest weight gain?
The smallest weight gain? Are there feed types whose weight gains are possibly
the same? Do not do any statistical tests to answer these questions. Support your
answers with the graph.
74. (2 points) Import the data in the file GSS2006.csv (available in the zip
file with Chihara/Hesterberg data in Blackboard) as a data frame. Make suitable
two-way tables to explore the following questions. Do not carry out any statistical
tests.
a) Does it appear that the distribution of views towards the death penalty is the
same in all regions of the country?
b) Does it appear that the distribution of views towards the legalization of marijuana
is the same in all regions of the country?
c) Does it appear that views towards the death penalty and towards the legalization
of marijuana are related?
In the next three problems, distribution A is a standard normal distribution and
distribution B is a N(1, 2
2
) distribution. Generate 20 random numbers from distribution
A and 30 random numbers form distribution B and record these in a suitable
data frame. Use these data for all three problems. Be sure to fix a suitable random
seed before simulating your data.
75. (2 points) Examine the null hypothesis that the means of A and B are the
same against the alternative that the mean of B is larger, using a permutation test.
Report the p-value and state your conclusion.
76. (2 points) Examine the null hypothesis that the variances of A and B are
the same against the alternative that the variance of B is larger, using a permutation
test. Report the p-value and state your conclusion.
77. (2 points) Examine the null hypothesis that the 75th percentiles of A and
B are the same against the alternative that the 75th percentile of B is larger, using
a permutation test. Report the p-value and state your conclusion.
For the next three problems, assume that data are in a random sample of size n
from a normal N(µ, 1) distribution with unknown µ. The null hypothesis is always
H0 : µ = 0. The first step for each problem is to find the sampling distribution of
the sample mean x¯.
78. (2 points) Suppose the alternative is that µa > 0. You are going to use
the sample mean ¯x as the test statistics. You plan to conduct the test by rejecting
H0 if ¯x is sufficiently large, i.e. ¯x > x0 for some x0, and you have already decided
that you will reject always the null hypothesis if the p-value of the test is < 0.05.
Use R to compute x0 as a function of n for n = 5, 10, 20, 50, 100. The interval
[x0, ∞) is called the rejection region.
79. (2 points) Suppose the alternative is again that µa > 0. A different
approach consists in rejecting the null hypothesis if ¯x ≥ x0 for some predetermined
x0 > 0.
a) Suppose you do this with x0 = 0. What is the largest possible p-value for which
you would still reject?
b) Suppose x0 = .4 and n = 20. What is the largest possible p-value for which you
would still reject?
c) Suppose x0 < 0. Explain why in this case you might reject H0 even if the p-value
is larger than 0.5.
80. (2 points) As in the previous problem, you have decided that you will
reject the null hypothesis if ¯x ≥ x0. You have chosen x0 = .4 and the sample size
is n = 205. Suppose now that the alternative hypothesis is actually true and that
in fact µ = 0.5. You don’t know this, of course. Compute the probability that
you will reject the null hypothesis, i.e. that you make the correct decision, using
R . This probability is called the power of the test. It depends on µ, n, x0, among
other things.
81. (5 points) Distribution of p-values. In this exercise is you will gain
insight into the behavior of p-values if the null hypothesis is true. Consider data
coming from a certain N(µ, σ2
) distribution. The null hypothesis is that µ = 1, σ2 =
8. The alternative is that µ > 1, σ2 = 8. We use the sample mean ¯x of a random
sample of size n = 15 as test statistic.
a) Find the exact sampling distribution of ¯x, assuming the null hypothesis is true.
b) Since each observed ¯x results in a p-value, we can regard the p-value as a random
variable. And since the exact null distribution of ¯x is known from part a), one can
compute this p-value, using the cdf of this distribution. Use R to compute simulate
10,000 sample means ¯x and find all p-values. Make a histogram and plot the ecdf.
What is the distribution of the p-values? Can you explain this?
82. (2 points) Problem 3.9 #12abc in Chihara/Hesterberg.
83. (2 points) Problem 3.9 #14 in Chihara/Hesterberg.
84. (2 points) Problem 3.9 #25 in Chihara/Hesterberg. Import the dataset
Lottery.csv and conduct a test of the null hypothesis that the data in the file come
from a multinomial distribution on {1, . . . , 39} with all pi =
39 , using a suitable
buyilt-in procedure. Report the p-value and state your conclusion. This is similar
to the question whether birth dates of soccer players follow a uniform distribution.
85. (2 points) Problem 3.9 #30 in Chihara/Hesterberg.
86. (2 points) Consider the following pairs of attributes from the GSS2002
dataset. Associations between all these pairs could be examined with a χ
test.
Which of these would be questions about homogeneity of distributions across several
populations, which would be questions about independence of attributes? Explain
each answer in one sentence. Do not carry out any tests for this problem.
• Gender and education
• Race and education
• Happiness and political party
• Gender and views of death penalty
• Views of gun laws and race
87. (2 points) Import the GS2002 data set.
Use a χ
test to determine if the following attributes are independent. Explain to
yourself why the number of degrees of freedom is correct in each case.
• Gender and education
• Happiness and political party
88. (2 points) Consider the data frame Problem58 in the R workspace hw7.RData.
Explain why a χ
test should not be used to investigate the question whether the
variables X and Y are independent. Then use a permutation test to study this
question.
89. (5 points) Import the data set Titanic.csv which contains survival data
(0 = death, 1 = survival) and ages of 658 passengers of the Titanic which sank on
April 15, 1912. Examine the null hypothesis that the mean ages of survivors and
of victims are the same against the alternative that these mean ages are different,
using a permutation test. Compute the p-value and state your conclusion. This is
a two-sided test. How should the p-value be computed in this case?
90. (5 points) The dataset NCBirths2004.csv contains data from over 1000
births in the state of North Carolina. One of the columns contains the weight of
the newborn baby in grams. Another column tells you whether the mother was
a smoker (Yes or No). We want to determine whether the data contain evidence
that babies born to mothers who smoke weigh less on average than babies born to
non-smoking mothers.
Import the dataset, make side by side boxplots of birth weights for smoking and
non-smoking mothers, formulate suitable hypotheses, carry out a permutation test,
and state your conclusion.