STATS 331辅导、R语言辅导、讲解hypothesis testing、讲解R编程设计解析C/C++编程|辅导Python编程

Department of Statistics
STATS 331, Second Semester 2019 Assignment 2 (5%) Due: 2pm Wednesday 2019-09-18
Please answer all three questions. In this assignment you will:
• Do Bayesian parameter estimation analytically
• Do Bayesian hypothesis testing
• Summarise posterior distributions
• Predict the future
1. [16 mark(s)] Please use R to do this question, and include your code along with your answers.
An olympic diver is practicing a difficult dive. A year ago, the diver achieved a 70% success rate.
To measure his ability this year, called θ, he performs N = 20 attempts and counts the number of
successes x. From the structure of the experiment, we know the sampling distribution is
x|θ ∼ Binomial (N, θ). (1)
Consider the following hypotheses about θ:
H0 : θ = 0.7 (2)
H1 : θ 6= 0.7 (3)
(a) [2 mark(s)] Use R to construct a vector of possible θ values from 0 to 1 in steps of 0.01.
(b) [2 mark(s)] Construct a vector of prior probabilities that assigns 50% prior probability to H0,
and spreads the other 50% uniformly among the other possibilities.
(c) [4 mark(s)] The observed number of successes is x = 18. Calculate the posterior distribution
for θ, and hence the posterior probability of H0.
(d) [2 mark(s)] The diver wants to use this analysis to predict whether he will succeed in doing this
dive when he attempts it in a competition the next day. However, he doesn’t know whether
his success probability will be affected by nerves. Let M0 be the hypothesis that nerves don’t
matter, so his success rate of θ (measured while practicing) still applies in the competition.
Calculate the posterior predictive probability of success for the dive in the competition, given
M0.
(e) [2 mark(s)] Consider another hypothesis M1, which states that if the diver’s success rate is θ
in practice sessions, it will be θ
in the competition, due to nerves1
. Calculate the predictive
probability of success on the dive in the competition, given M1.
(f) [4 mark(s)] In the competition, the dive is a failure. Calculate the Bayes Factor for M1 over
M0, and the posterior odds ratio for M1 over M0, assuming a prior odds ratio of 1.
2. [14 marks] The following histogram shows the ‘inter-arrival times’ t1, t2, ..., tN of births over a 24
hour period at a major Brisbane hospital2
. According to theory, the distribution of these times
1Since θ is between 0 and 1, squaring it makes the success probability smaller.
2Real data!
STATS 331, Second Semester 2019 Assignment 2 (5%) Page 1 of 3
(the gap between one birth and the next) should follow an exponential distribution. The number
of data points is N = 43 and the average of the data (the ‘sample mean’) is t¯=
i=1 ti = 0.554
hours.
Histogram of inter−arrival times of babies
t (hours)
Frequency
0.0 0.5 1.0 1.5 2.0 2.5
For a single variable t, an exponential probability density function is
p(t|λ) = λe−λt (4)
where t > 0 and λ is a parameter which controls the width of the exponential distribution. In fact,
the expected value of t is 1/λ, so λ is the expected arrival rate of babies. In this question you’ll ‘fit
an exponential distribution to the data’, i.e., find the posterior distribution for λ given the ts.
(a) (3 marks) Write down the expression for the likelihood function p(t1, t2, ..., tN |λ). Simplify and
rearrange the result so that the average of the data values, t¯, appears in it.
(b) (3 marks) Assuming a log-uniform prior for λ, find the posterior distribution p(λ|t1, ..., tN ) and
write the result in “∼” notation.
(c) (2 marks) Re-do (b) with an ‘informative’ Gamma(24, 0.04) prior.
(d) (4 marks) Find the posterior mean and mode, which can be used as point estimates of λ, for
the two posteriors from parts (b) and (c). Present these in a table where columns refer to the
priors used and rows refer to the choice of point estimate.
(e) (2 marks) Find the posterior standard deviations for the two posteriors.
(f) (3 marks) When choosing a point estimate for λ, suppose the loss function is quadratic. Choose
the best point estimate in this situation and find (i) the value of the point estimate using the
two posteriors; and (ii) the value of the posterior expected loss using the two posteriors.
3. [11 mark(s)] A paleontologist is trying to measure the age a (in millions of years) of a dinosaur
skeleton. She constructs a large Bayes Box to work out the posterior distribution for a given some
STATS 331, Second Semester 2019 Assignment 2 (5%) Page 2 of 3
data. The Bayes Box columns are calculated by some R code contained in a file dinosaur.R which
you can download from Canvas. The code also plots the posterior distribution.
(a) [4 mark(s)] Find a 95% credible interval for a (include your code).
(b) [2 mark(s)] Find the posterior median for a (include your code).
(c) [1 mark(s)] The paleontologist is going to be interviewed on television about the dinosaur, and
she needs to have a point estimate ˆa of the age so she can say it quickly rather than having
to talk about the posterior distribution. She decides that the loss associated with stating ˆa on
TV when the true value is a is L = |aˆ − a|. Find the value of the best point estimate ˆa.
(d) [4 mark(s)] Later, the paleontologist develops a new technique that works out the true value a
with no uncertainty. Before applying it to this skeleton, she decides to throw a party if the true
value is between 0.9 times and 1.1 times the point estimate from (c). Calculate the probability
that she throws a party.
STATS 331, Second Semester 2019 Assignment 2 (5%) Page 3 of 3