S1.21 STATS101/101G/108
BLOCK 1
These questions are worth one mark each.
1. Pick the option that correctly completes the statement.
Data from a categorical variable are:
2. Pick the option that correctly completes the statement.
A study which observes the same group of individuals or units over a long period of time is called
a:
3. Pick the option that correctly completes the statement.
Consider a well-designed experiment involving a group of volunteers.
A tail proportion of less than 5% in the randomisation test allows us to make:
4. Pick the option that correctly completes the statement.
Using random sampling:
5. Pick the option that correctly completes the statement.
A bootstrap confidence interval may be interpreted as an interval:
group or category names for each entity.
measurements or counts taken on each entity.
cross-sectional study.
longitudinal study.
sample-to-population inference.
experiment-to-causation inference.
allows for the calculation of the likely size of sampling errors.
will guarantee representative samples.
of plausible values for the parameter.
within which the parameter is certain to lie.
6. Pick the option that correctly completes the statement.
All other things being equal, bigger sample sizes give:
7. Pick the option that correctly completes the statement.
The null hypothesis, H , is the:
8. Pick the option that correctly completes the statement.
When conducting a t-test a plot of the sample data is used to check for evidence of:
9. Pick the option that correctly completes the statement.
For a Chi-square test for independence, there will be evidence against the null hypothesis if there
are relatively:
10. Pick the option that correctly completes the statement.
The sign (+ or -) of the sample correlation coefficient, r, is:
Note: For Questions 11 to 20 be careful which option you choose because the order of the
True/False options may change from question to question.
11. Decide whether this statement is True or False.
wider confidence intervals.
narrower confidence intervals.
0
hypothesis we test.
research hypothesis.
non-Normal features.
independence.
small differences between the observed and expected counts in one or more cells.
large differences between the observed and expected counts in one or more cells.
not necessarily the same as the sign of the slope of the least squares regression line.
always the same as the sign of the slope of the least squares regression line.
For highly skewed data the sample median is a more sensible measure of the centre than the
sample mean.
12. Decide whether this statement is True or False.
An observational study can be used to reliably establish the cause of an effect.
13. Decide whether this statement is True or False.
Under chance alone, when comparing two groups, the difference we observe would purely and
simply be due to which units just happened to have ended up in which group and nothing else.
14. Decide whether this statement is True or False.
Taking larger samples will not reduce the effects of selection bias and other nonsampling errors.
15. Decide whether this statement is True or False.
We can be certain that the true value of a population parameter is somewhere in a bootstrap
confidence interval for that parameter.
16. Decide whether this statement is True or False.
The level of confidence is the long-run success rate for a method which aims at producing
confidence intervals which contain the unknown value of the parameter.
False
True
True
False
True
False
True
False
False
True
17. Decide whether this statement is True or False.
Statistical significance implies practical significance.
18. Decide whether this statement is True or False.
If the P-value for an F-test for one-way analysis of variance is large then the differences we see
between the sample means could be due to chance alone.
19. Decide whether this statement is True or False.
The greater the value of the Chi-square test statistic, the weaker the evidence against the null
hypothesis.
20. Decide whether this statement is True or False.
The correlation coefficient measures the strength and the direction of a linear relationship between
two numeric variables.
False
True
False
True
True
False
False
True
False
True
Maximum marks: 20
2 Block 2: Questions 21 to 24
These questions are worth two marks each.
Questions 21 to 24 refer to the information in Appendix A.
21. Which one of the following statements about the study is false?
22. Refer to Figure 2.
Which one of the following statements could be false?
23. Refer to Figure 3.
Which one of the following statements is false?
This study is an experiment because the participants were randomly allocated to either the
TimeRestriction group or the NoTimeRestriction group.
The response variable was ReportedNumber.
The researchers were blinded because they did not know what number the participants
actually rolled.
The NoTimeRestriction group was the control group.
This study had a completely randomised design.
There were more participants in the NoTimeRestriction group who actually rolled a 1 than
participants in the TimeRestriction group who actually rolled a 1.
The standard deviation of the ReportedNumber for the NoTimeRestriction group is higher
than that of the TimeRestriction group.
The median ReportedNumber for the NoTimeRestriction group is less than that of the
TimeRestriction group.
Numbers less than 3 were reported less often by participants in theTimeRestriction group
than were reported by those in the NoTimeRestriction group.
Participants in the TimeRestriction group tended to report a higher number than participants
in the NoTimeRestriction group reported.
24. Suppose that the researchers were also interested in seeing if the underlying mean time taken
to report their number by those in the NoTimeRestriction group was different to the underlying
mean time taken to report their number by those in the TimeRestriction group.
Let be the difference between the underlying mean time taken to report their number
by those in the NoTimeRestriction group and the underlying mean time taken to report their
number by those in the TimeRestriction group.
Which one the following are a correct pair of hypotheses for this test?
We have evidence that the time restriction caused the participants in the
TimeRestriction group to roll higher numbers.
The P-value for this randomisation test is less than 5%.
We have evidence that chance was not acting alone in the actual study.
We may claim that the time restriction had an effect on the mean ReportedNumber.
We have evidence that Group together with chance produced the observed result.
μNTR − μTR
H0 : ¯¯x¯ NTR − ¯¯x¯ TR ≠ 8
H1 : ¯¯x¯ NTR − ¯¯x¯ TR = 8
H0 : μNTR − μTR = 0
H1 : μNTR − μTR ≠ 0
H0 : ¯¯x¯ NTR − ¯¯x¯ TR = 0
H1 : ¯¯x¯ NTR − ¯¯x¯ TR ≠ 0
H0 : μNTR − μTR ≠ 0
H1 : μNTR − μTR = 0
H0 : μNTR − μTR = 8
H1 : μNTR − μTR ≠ 8
Maximum marks: 8
3 BLOCK 3: Questions 25 to 30
These questions are worth two marks each.
Questions 25 to 30 refer to the information in Appendix B.
25. Which one of the following could not be present in the data collected?
Questions 26 and 27 refer to Figure 4 and the accompanying information.
26. Which one of the following statements is false?
27. Suppose that it was decided to use t-procedures to calculate a 95% confidence interval for the
difference between the proportion of those interested in politics who said that they had voted and
the proportion of those not interested in politics who said that they had voted.
The sampling situation for calculating the standard error of the estimate is:
Nonresponse bias
Interviewer effects
Question effects
Behavioural considerations
Sampling error
The bootstrap confidence interval includes the difference in the sample proportions.
The smallest sample proportion is the proportion of those not interested in politics who said
that they did not vote.
It's a fairly safe bet that the proportion of those interested in politics who said that they had
voted is somewhere between 11 and 19 percentage points higher than the proportion of
those not interested in politics who said that they had voted.
The majority of respondents said that they had voted.
In every resample the difference in percentage points was more than five.
Questions 28 to 30 refer to Figure 5 and the accompanying information.
28. The test-statistic, , for this t-test is approximately:
29. Which one of the following statements is not a correct interpretation of the P-value for this t-
test?
30. A 95% confidence interval for is (0.06, 0.12). Suppose that we wish to calculate a
90% confidence interval using the same data.
Which one of the following statements is false?
one sample of size 3207, several response categories.
one sample of size 3412, several response categories.
two independent samples of sizes 3207 and 205.
one sample of size 3412, many yes/no items.
two independent samples of sizes 385 and 3027.
t0
0.09
0.03
4.32
5.59
1.96
At the 5% level of significance we can reject the null hypothesis.
At the 10% level of significance we can reject the alternative hypothesis.
At the 5% level of significance we can claim that is greater than .pN pL
At the 1% level of significance we can claim that is greater than .pN pL
The observed difference is a statistically significant result (at the 5% level).
pN − pL
The 90% confidence interval will:
be calculated using the same t-multiplier.
be narrower than the 95% confidence interval.
not include zero.
be calculated using the same standard error.
have a smaller margin of error.
Maximum marks: 12
4 BLOCK 4
These questions are worth two marks each.
Questions 31 to 42 refer to the information in Appendix C.
31. Refer to Figure 6.
Which one of the following statements is false?
32. Which one of the following is a correct pair of hypotheses for this t-test?
33. Refer to the test output in Table 1.
Which one of the following statements is false?
If the mean were shown on the plot it would be below the median.
Approximately half of the movies in this dataset grossed more in the US than in the rest of
the world.
This data is positively (right) skewed.
There are no gross outliers in this data.
We could use this plot to check the assumption of Normality for a paired data t-test on this
data.
H0 : μDiff ≠ 0
H1 : μDiff = 0
H0 : μDiff = 0
H1 : μDiff ≠ 0
H0 : ¯¯x¯ Diff = 0
H1 : ¯¯x¯ Diff ≠ 0
H0 : ¯¯x¯ Diff ≠ 0
H1 : ¯¯x¯ Diff = 0
H0 : μDiff = 0
H1 : μDiff > 0
34. Based on the results of this t-test, which one of the following is a correct statement?
Questions 35 to 40 refer to Tables 2 & 3, Figures 7 & 8 and the information that goes with them.
35. Refer to Figure 7.
Which one of the following statements is false?
With 95% confidence, we estimate that the underlying mean
difference in gross income is somewhere between US$5.9 million
and US$14.3 million.
The confidence interval is narrow compared to the range of the
sample data because it is calculated using a relatively large dataset.
We cannot be certain that μDiff is somewhere between
US$5.9 million and US$14.3 million.
¯¯x¯ Diff is in the middle of the 95% confidence interval for μDiff .
The margin of error for the 95% confidence interval for μDiff
is US$2.131 million.
We may claim that, on average, movies' gross income from the rest of the world is higher
than it is from the US.
We have very strong evidence that a movie's gross income from the rest of the world is
higher than it is from the US.
We may claim that a movie makes more of its gross income in the rest of the world than in
the US.
We have very strong evidence that, on average, movies' gross income is more in the US
than in the rest of the world.
It is not plausible that, on average, movies make US$10 million more in the rest of the world
than in the US.
Questions 36 to 39 assume that a simple linear regression is appropriate.
36. The equation for the least squares regression line for this analysis is:
37. One of the movies that had a budget of US$40 million had a total gross income of US$315
million. Under this regression analysis, the residual for this movie is approximately:
38. For movies like those in this dataset, which one of the following statements is true?
With 95% confidence, we estimate (to 1 decimal place) that, on average, an increase of US$10
million in the budget is associated with:
The movie with the highest budget had the highest total gross income.
Only two movies had a total gross income of more than US$1250 million.
As Budget increases the variability in Total tends to increase.
The maximum budget for any of these movies is about US$300 million.
It looks like a lot of movies have a gross income less than US$250 million.
Predicted Total = 8.996 + 3.035 x Budget
Predicted Total = -8.996 + 3.035 x Budget
Predicted Total = 3.035 + 0.117 x Budget
Predicted Total = 3.035 - 8.996 x Budget
Predicted Total = 0.117 + 3.035 x Budget
48
203
−275
−203
−183
39. Refer to Table 3.
Which one of the following statements is false?
For movies like those in the dataset, with 95% confidence we estimate that:
40. Which one of the following statements is false?
We should be wary of using the results of this regression analysis to predict the total gross income
of a movie released in 2020, based on its budget of US$600 million because:
a decrease in the total gross income of somewhere between US$7.34 million and US$25.5
million.
a decrease in the total gross income of US$90.0 million.
an increase in the total gross income of somewhere between US$28.1 million and US$32.7
million.
an increase in the total gross income of US$30.4 million.
an increase in the total gross income of somewhere between US$7.34 million and US$25.5
million.
movies with a budget of US$82.5 million will have an underlying mean total income of
between US$231.0 and US$251.8 million.
movies with a budget of US$190.0 million will have an underlying mean total income of
between US$266.3 and US$869.1 million.
a movie with a budget of US$175.0 million will have a total income of between US$221.1
and US$823.2 million.
movies with a budget of US$40.0 million will have an underlying mean total income of
between US$102.4 and US$122.4 million.
a movie with a budget of US$225.0 million will have a total income of between US$371.6
and US$976.2 million.
41. Suppose we wish to investigate whether, on average, some distributors have a higher total
gross income from their movies than others.
Given that the underlying assumptions are satisfied, which form. of analysis, using the variables
Total and Distributor, is most appropriate?