STAT7002 Examination 2018 Page 1
Answer ALL questions. Section A carries 40% of the total marks and Section B carries 60% of
the total marks. The relative weights attached to each question are as follows: A1 (9), A2 (8),
A3 (8), A4 (9), A5 (6); B1 (20), B2 (20), B3 (20). The numbers in square brackets indicate
the relative weight attached to each part question. An appendix containing some formulae from
the STAT7002 course is provided at the end of this examination paper.
Section A
A1. A travel company uses the following questions on a questionnaire given to tourists as part
of a visit to London. For each question, briefly identify a potential problem that might
lead to bias. Explain your reasoning.
(a) ‘The Phantom of the Opera has played continuously at Her Majesty’s Theatre since
1986, winning over 70 major theatre awards and receiving much critical acclaim.
Did you see it during your stay?’
[3]
(b) ‘Do you think that public transport in London is easy to use and reasonably priced?
Please circle a response.’
YES NO
[3]
(c) ‘In the box below, please write down the amount that you spent during your stay
in London (not including money spent on accommodation and travel).’
£
[3]
A2. A British town contains 25000 eligible voters. Before an election, a political researcher
performs a simple random sample of 400 eligible voters. Each sampled voter is asked if
they intend to vote for the Labour party candidate, with two possible responses: ‘Yes ’
or ‘No’. Of the sampled voters, 118 answered ‘Yes ’ to this question.
(a) Define the term simple random sample.
[2]
(b) Assuming that there was no non-response, calculate an estimate and an associated
95% confidence interval for the proportion of Labour voters in the town. You should
define any notation that you introduce and show your working clearly.
[6]
Turn Over
STAT7002 Examination 2018 Page 2
A3. A property investor owns four houses in London. The values of the houses are shown in
the table below, with letters (A–D) to denote the different houses.
House Value (£)
A 499,950
B 525,000
C 610,000
D 774,950
(a) A prospective buyer wishes to view two of the property investor’s houses. The
buyer will choose which houses to view using a simple random sample. Assuming
this sampling approach, derive the sampling distribution of the sample mean house
value. You should define any notation or terms that you introduce.
[5]
(b) Using your answer to (a), calculate the expectation of the sample mean.
[3]
A4. The descriptions (a)–(c) outline sampling schemes. For each of (a)–(c), identify the type
of sampling scheme used and describe a potential problem with the proposed sampling
scheme. Justify your answers.
(a) A researcher wants to know about the experience of passengers who use the London
Underground. A questionnaire is devised and the researcher stands outside Hol-
born Station between 0900 and 1100 on a given Monday morning, asking potential
respondents who pass by to participate in a survey.
[3]
(b) An investigative reporter is interested in finding out about the living conditions of
illegal workers. The reporter knows three illegal workers, who agree to participate
in a study. These illegal workers are asked to invite any other illegal workers, whom
they know, to participate in the same study.
[3]
(c) A high school contains 1200 pupils aged 11–16. The list of pupils in the school is
ordered by date of birth (youngest to oldest) and every tenth pupil on the ordered
list is selected to participate in a school sports event, until an overall sample size of
50 pupils is reached.
[3]
Continued
STAT7002 Examination 2018 Page 3
A5. Over a 30 month period, 60 obese males participated in a weight loss study. The weight of
each study participant was recorded at several time points. To show the change in mean
weight of the participants over time, the study research team produced the following
visual display of their data.
(a) Identify two, distinct, problematic features of this visual display. Justify your an-
swer.
[2]
(b) Identify the scale type used for each of the following study variables. You should
justify your answer in each case.
(i) A participant’s weight (in pounds).
(ii) The number of visits to the gym that a participant makes.
[4]
Turn Over
STAT7002 Examination 2018 Page 4
Section B
B1. Researchers from a town’s council want to measure residents’ attitudes concerning the
living environment in their town. Below are two statements that the researchers aim to
present to a sample of residents as part of a questionnaire.
‘There is too much litter around the town centre.’
‘Public spaces and gardens within our town are well maintained.’
(a) Using these statements as examples, describe how a Likert Scale could be con-
structed in this questionnaire to measure the attitude of residents concerning the
living environment in the town. Your answer should include a description of polarity
and a definition of the polarity of each of the above statements.
[9]
(b) Describe how Likert Scale responses for a single item (such as either of those above)
could be summarised and presented for a sample of residents who complete the
questionnaire.
[3]
(c) Explain what is meant by the reliability of a measurement instrument and describe
how the reliability of a questionnaire, in which several responses are used to measure
the same attitude with a Likert scale, may be assessed.
[4]
The council decide that they will sample 200 of the town’s residents; the target population
for the study is all adult residents of the town (20000 people). Sampling will be done
by randomly selecting e-mail addresses of people who have paid Council Tax using the
council’s online payment system, with a link to an online questionnaire sent to each
selected e-mail address.
(d) Is this proposed sampling scheme satisfactory? Justify your answer.
[4]
Continued
STAT7002 Examination 2018 Page 5
B2. Suppose that Y1, . . . , YN are binary variables in a population of size N ∈ N with N > 2.
The population proportion is given by
P =
1
N
N∑
i=1
Yi.
A researcher wants to draw a simple random sample of size n from this population (where
n < N), in which the sampled variables are denoted y1, . . . , yn.
(a) Denoting Pˆ as the sample mean of the n sampled binary variables, show that
Var(Pˆ ) =
P (1− P )(N − n)
n(N − 1) .
You may use the following without proof
Cov(yj, yk) =
−P (1− P )
N − 1 for j 6= k.
[6]
(b) The researcher wants to sample enough binary variables so that the standard error
of Pˆ is less than some pre-specified positive constant c. Show that the number of
sampled variables, n, should satisfy
n >
[
4(N − 1)c2
N
+
1
N
]−1
.
[5]
A high school, in which the number of registered pupils is 900, wishes to perform a
simple random sample of pupils. Each sampled pupil will be sent a postal questionnaire
on various aspects of school life. One of the questions will ask ‘Overall, are you satisfied
with the standard of teaching at school? ’ with respondents given two answer options of
‘Yes ’ or ‘No’. It is assumed that the proportion of pupils who would not answer this
question is 10%.
(c) Calculate the number of pupils that should be sampled so that the proportion of
pupils who are satisfied with the school’s standard of teaching can be estimated with
a standard error no larger than 0.03. You should show your working clearly and
define carefully any assumptions that you make.
[5]
(d) The school’s headteacher assumes that pupils who are not satisfied with the standard
of teaching at the school are less likely to answer the question on the standard of
teaching than other pupils in the school, leading to missing data for some of the
answers to this question. Describe this missing data assumption, using words and
appropriate mathematical notation.
[4]
Turn Over
STAT7002 Examination 2018 Page 6
B3. A town contains two medical centres (labelled A and B). Centre A has 2000 registered
adult patients and Centre B has 3000 registered adult patients. A medical researcher
carries out a stratified random sample of adult patients from these medical centres, with
stratification done by the centre at which a patient is registered. A total of 400 adult
patients are sampled (150 registered at Centre A and 250 registered at Centre B) and the
body mass index (BMI) is recorded for each sampled patient. For patients sampled from
Centre A, the sample mean and sample standard deviation BMI are 25.2 kg/m2 and 3.8
kg/m2, respectively. For patients sampled from Centre B, the sample mean and sample
standard deviation BMI are 28.1 kg/m2 and 4.1 kg/m2, respectively.
(a) Define the term stratified random sample.
[3]
(b) Calculate an estimate of the mean BMI of adults in the town and an associated
95% confidence interval. You should show your working and define any notation or
terms that you introduce.
[8]
Another researcher plans to sample 400 of the town’s households at random and collect
data on the BMI of adult occupants of each sampled home.
(c) Identify the sampling approach that this researcher has proposed. Justify your
answer.
[3]
(d) Assuming that this sampling approach is used, write down an appropriate statistical
model for BMI that accounts for variability between adults within households and
for variability between households. You should define any notation or terms that
you introduce.
[6]
Continued
STAT7002 Examination 2018 Page 7
STAT7002 Social Statistics: Some formulae
Below are some formulae from the STAT7002 course notes. Note that these formulae are just
copied, there is no properly introduced notation and no explanation regarding each formula.
The same symbol may mean different things in different formulae and may not necessarily
apply to any examination question where the same symbol is used.
There is no guarantee that any of these formulae is needed in the examination.
In addition, there is no guarantee that all formulae required for this examination are listed