首页 > > 详细

辅导A1 8解析R程序、R设计解析

Question Mark Out of 
Instructions 
Answer each question in the space provided. You can write in pen or pencil. Marks are 
indicated next to each question. The total mark for the exam is 100. 
Part A (45 marks in total) 
Question A.1 (1+1+1+1+2+1+1=8 marks) 
Consider the following set of numbers: -25, 2, 3, 8, 10, 14, 18, 21, 32. For each of the questions 
below, state your answer, showing working if necessary. 
(a) What is the median? 
(b) What is the 1st quartile? 
(c) What is the 3rd quartile? 
(d) What is the interquartile range. 
(e) Hence sketch a box-plot. Lay it out horizontally below. Be sure to mark the values of 
the various parts. 
Marks / 6 Page 3 of 43 
(f) You are told the mean of the numbers is 9.222 and the mean of their square is 309.666. 
What is the sample standard deviation? 
(g) If you only knew the mean and sample standard deviation of the sample, what does 
Chebyshev’s inequality tell you? 
Marks / 2 Page 4 of 43 
Question A.2 (4+2+2+4=12 marks) 
Throughout this question, show your working and leave your answer in a clear from. Of those 
reporting to a medical clinic, 2% have medical condition Z. It is assumed that this figure of 
2% is also the base rate across the population. There is a test for condition Z such that, for 
those patients who have condition Z, 85% will test positive; and for those patients who do 
not have condition Z, 25% will test positive. 
(a) If a patient tests positive, what is the probability that the patient has condition Z? 
After some consideration, it is decided that the test gives too many false positives, and it 
is decided to modify the test as follows. The new test is simply to administer the original 
test twice, where it is assumed that these two tests give results that are independent of one 
another. A patient will be considered to have tested positive on the new test precisely in 
those cases where both tests on the original test return a positive result. 
(b) If a patient has condition Z, what is the probability that the patient will test positive 
on the new test? 
Marks / 6 Page 5 of 43 
(c) If a patient does not have condition Z, what is the probability that the patient will test 
positive on the new test? 
(d) If a patient returns a positive result on this new test, what is the probability that the 
patient has condition Z? 
Marks / 6 Page 6 of 43 
Question A.3 (2+3+3+2=10 marks) 
Consider the probability density func- 
tion given at the right, defined by 
(c) Calculate V [(X + 1)(Y + 1)]. 
Marks / 6 Page 9 of 43 
Question A.5 (3+3+3=9 marks) 
Consider the probability density function given by a mixture of two Gaussians with identical 
standard deviation σ, as 
p(x|ρ, µ1, µ2, σ) = ρN(x|µ1, σ) + (1− ρ)N(x|µ2, σ) 
where N(·|·) is the probability debsity function of a Gaussian. Thus the expected value of 
function f(x) under this distribution is given by 
Eρ,µ1,µ2,σ [f(x)] = ρEN(µ1,σ) [f(x)] + (1− ρ)EN(µ2,σ) [f(x)] 
where the two expected values on the right hand side are done using Gaussian distributions. 
(a) What is the mean of x for the mixture of two Gaussians? 
(b) What is the mean of x2 for the mixture of two Gaussians? 
Marks / 6 Page 10 of 43 
(c) What is the variance for the mixture of two Gaussians? 
Marks / 3 Page 11 of 43 
Part B (25 marks in total) 
Question B.1 (3+2+3=8 marks) 
You have data x distributed as Poisson with rate λ = 16, so x ∼ Pois(16). 
(a) Show how to use the central limit theorem to get an approximate value for p(10 ≤ x ≤ 
20). Compute the approximate value, noting that the Z tables are only accurate to 2 decimal 
places. 
(b) You have a sample of 10 values from this distribution, and compute its mean x. What 
is an approximate distribution for x? 
(c) What are 95% confidence intervals for the mean x, according to this approximation? 
Marks / 8 Page 12 of 43 
Question B.2 (2+5=7 marks) 
While IQ is considered to have a mean of 100 and standard deviation of 15. You expect 
students in your masters class will have a higher mean. 
(a) Given a sample of size 10, compute a one-sided 95% confidence interval in the form 
(−∞, I] for where the measured mean should lie. 
(b) You get data from 10 students with the form [104, 120, 100, 112, 133, 138, 111, 118, 114, 118]. 
Note that the mean of the sample is 116.8 and the mean of the squares of the sample is 13765.8. 
Test the null hypothesis that the students’ IQ has mean 100. Without assuming you know 
the standard deviation, give the test statistic and the p-value for this data. Note the tables 
of statistics given at the back of the exam will not allow you to lookup the p-value precisely. 
Marks / 7 Page 13 of 43 
Question B.3 (2+2+4+2=10 marks) 
You obtain paired data (X,Y ) with values ~x = [4.59, 4.60, 6.32, 4.85, 3.27, 5.92, 1.92, 6.90, 4.82, 5.39] 
and ~y = [2.89, 2.46, 3.28, 2.34, 2.11, 3.56, 1.77, 3.29, 2.46, 2.60]. The various sample means (us- 
ing the above data) are: 
x = 4.859 
y = 2.677 
x2 = 25.516 
y2 = 7.460 
xy = 13.670 
(a) What is the correlation co-efficient between X and Y ? What does this tell you about 
X and Y ? 
(b) Fit a simple linear model to this data in the form 
Yˆ = β0 + β1X 
What are your estimates for β0 and β1? 
Marks / 4 Page 14 of 43 
(c) What are the standard errors for β0 and β1? 
(d) Test the hypothesis the β1 = 0. What is your test statistic and its p-value? What is 
the outcome of the test? 
Marks / 6 Page 15 of 43 
Part C (30 marks in total) 
Question C.1 (2+2+2=6 marks) 
You have a data set supplied as real-valued pairs (X,Y ) and you wish to regress X onto Y . 
You have 2 models: 
A: a 4 degree polynomial 
yˆ = 
4∑ 
i=0 
aix 
B: a 20 degree polynomial 
yˆ = 
20∑ 
i=0 
aix 
(a) Describe how the bias of models A and B differ. 
(b) Describe how the variance of models A and B differ. 
Marks / 4 Page 16 of 43 
(c) If you had 100 data points in your sample, which of ther two models would you recom- 
mend? Justify your answer. 
Marks / 2 Page 17 of 43 
Question C.2 (5+3+2+2=12 marks) 
(a) You wish to build a na¨ıve Bayes classifier regressing Booleans A, B and C onto the 
Boolean X. Someone has already counted the data for you to create frequency tables below: 
A=0 A=1 B=0 B=1 C=0 C=1 
X=0 10 40 30 20 15 35 
X=1 30 20 5 45 40 10 
Construct probability tables as needed to specify the estimated na¨ıve Bayes classifier for the 
task. Then give the formula for the classifier and describe how it would be used. 
Marks / 5 Page 18 of 43 
(b) Consider the probabilities p(A=0|X=0) and p(B=0|X=1). Compute their standard 
errors, making any assumptions as needed? What can you say about the resulting estimates? 
(c) Which would be better, the na¨ıve Bayesian classifier or the logistic regression classifier 
for this data set? Justify your answer. 
Marks / 5 Page 19 of 43 
(d) The first step of the k-means algorithm is to initialise the centroids. Describe a way 
this could be done, and why it is OK to use it. 
Marks / 2 Page 20 of 43 
Question C.3 (6=6 marks) 
Consider the probability density function given below, defined by 
This is two semi-circles side-by-side of radius 1/2, then scaled by 4/pi to get a PDF. 
Page 21 of 43 
(a) Devise pseudo-code for a rejection sampler for this distribution. Note the maximum 
value is marked at 2pi . 
Marks / 6 Page 22 of 43 
Question C.4 (5+1=6 marks) 
You wish to build a decision tree to predict a three-valued variable X. The first two features to 
test are Booleans A and B. Someone has already counted the data for you to create frequency 
tables below: 
A=0 A=1 B=0 B=1 
X=0 10 40 30 20 
X=1 30 20 5 45 
X=2 30 20 45 5 
(a) Compute and report the quality measure for the attributes A and B using the informa- 
tion gain metric. 
Marks / 6 Page 23 of 43 
(b) Hence say which attribute is recommended to use at the root of the tree? 
Page 24 of 43 
Blank page for additional answers if needed. 
Page 25 of 43 
Blank page for additional answers if needed. 
Page 26 of 43 
联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!