MATH223 (Statistics)
Assignment
Due: 11:55 pm on 5 June 2020
This assignment must be lodged as a single PDF document (with each page numbered) in
the “Assignment 2 - submit here” on the Moodle site on or before the due date.
Penalty regarding late submission is given in the subject outline.
The assignment has 3 parts,and it will be assessed for completeness, accuracy, and
Important: Computer output must not form part of your submitted assignment unless it
has been properly annotated. It is your job to say what each piece of output means, and
not the marker’s job to guess. Be selective about what you include. The output should be
pasted into the relevant part of your submission, and not be left as an Appendix unless
specified otherwise.
This assignment must not exceed seven A4 pages, excluding appendix. An “A4 page” is
one side of a piece of A4 paper.
Part I: Calculation-based questions (2+2+2+2)
1. Let X be a random variable with the probability function f(x) given in the following
table
x 1 2 3 4 5
f(x) 0.1 0.5 0.2 0.1 0.1
Find E(X) and V ar(X). (2 marks)
2. A random variable X has probability density function f(x) defined as follows
f(x) =0.5x, 0 < x < 1,
1 − 0.5x, 1 ≤ x < 2,
0.5x − 1, 2 ≤ x < 4
(a) Find P(0 ≤ X ≤ 0.5).
(b) Find P(0.5 ≤ X ≤ 1.2).
(2 marks)
3. Consider an investment whose return is normally distributed with a mean of 5%
and a standard deviation of 10%
(a) Determine the probability of losing money.
(b) Find the probability of losing money when the standard deviation is equal to
20%.
(2 marks)
4. The weight of a typical roll of toilet paper is normally distributed with a mean of
230g and standard deviation of 15g. Use this information to answer the followings:
(a) In some stores toilet rolls are available individually. What is the probability
for a consumer buy a roll of toilet paper that weigh at least 250g?
(b) A standard pack of toilet paper contains 10 rolls. What is the probability that
the average weight of the pack of rolls is less than 227g?
(2 marks)
*****More questions on the next page******.
Part II: R analysis questions (5+9+8)
We will perform analysis on data from a study of online storage usage run by a
new cloud storage company. Researchers collected information from 25 users on the
following aspects of interest:
gender: gender of the account holder. Male is coded 0 and female 1.
OS: operation system used by the account holder
size: cloud storage used in GB
month: length of month since account activation
For the following questions in this section, each student must obtain your own
dataset. Instruction to obtain your dataset is given as follows:
1. Run the following code in R to open an HTML:
install.packages("shiny")
library(shiny)
shiny::runGist("16a860341eafa12d2be4f2f1ac4b0ef5")
Instructions for questions in Part II:
• Must include your dataset as an Appendix.
• Use R to answer the questions in Part II unless specified otherwise.
• Any R output must be accompanied by relevant R code.
If you fail to follow the instructions given above, you will receive no mark
for the following questions in Part II.
5. Summary statistics. Make sure to show both R code and the output when necessary.
(a) What is the variable type for size? What is the variable type for month? (1
marks)
(b) Produce a frequency table for OS. What is the relative frequency for “Microsoft”?
marks)
(c) Show the boxplot for month. From the boxplot (no need for doing extra
coding), what are the approximate values for the five number summary
statistics? Is there any outlier shown in the boxplot? Explain your reason or
show in the graph. (1.5 marks)
(d) Show the histogram for size. Describe the distribution for size. (1.5 marks)
6. The company estimated the average cloud storage usage cross the server is 120 GB.
Answer the following questions. Make sure to show both R code and the output
when necessary.
(a) Use R to find the mean, standard deviation of size. (1 marks)
(b) To test this hypothesis, what statistical test shall we conduct? (1 marks)
(c) Define null and alternative hypotheses. (1 marks)
(d) Calculate by hand the test statistic. (1 marks)
(e) Calculate by hand the degree of freedom for this test. (1 marks)
(f) Given we specific a significance level of 0.05, find the rejection region using the
t statistic table. (1 marks)
(g) Based on the results from the rejection region, make conclusion about the
company’s estimation. (2 marks)
(h) Use R to perform a test for this question. Paste your R output. (1 marks)
7. The analytic team is interested in the relationship between size and month. The
team hypothesised that longer an users is with the company, the larger their storage
usage will be. The analytic team requests you to provide answers to the following
questions. Make sure to show both R code and the output when necessary.
(a) Show a scatterplot of month and size. Explain the reason(s) of choosing
variable for each axis. What is the relationship between month and size shown
in this scatterplot? (1.5 marks)
(b) Use R to find the correlation coefficient between month and size. (0.5 marks)
(c) Use R to fit a least square line to the data. What is the regression equation?
(1 marks)
(d) For month= 6, calculate by hand the predicted value for storage size. (1 marks)
(e) Use the R output from (c), what is the standard error for size? (1 marks)
(f) Calculate by hand the 95% confidence level for size. (1 marks)
(g) To test whether month has any effect on size, define the null and alternative
hypotheses. (1 marks)
(h) Use p-value form the R output from (c), make a conclusion for the test in (g).
(1 marks)
*****More questions on the next page*****.
Part III: Statistical report critique (3+5)
8. A researcher wrote a report on a study of renal transplant patients. The report is
named “stats report” and is available on Moodle under the Assignment 2 Section.
Based on guidelines of writing a statistical report in Section 1 lecture note, you will
(a) comment on the existing “stats report” (no more than one page); (3 marks)
and
(b) using the statistical information given in the existing report, write a standard
statistical report appropriate for a general audience (no more than one page in
length). (5 marks)