Statistical Methods for Data Science
DATA7202
Semester 1, 2022
Assignment 4 (Weight: 25%)
Assignment 4 is due on 23 June 22 16:00.
Please answer the questions below. For theoretical questions, you should present
rigorous proofs and appropriate explanations. Your report should be visually appealing
and all questions should be answered in the order of their appearance. For programming
questions, you should present your analysis of data using Python, Matlab, or R, as a
short report, clearly answering the objectives and justifying the modeling (and hence
statistical analysis) choices you make, as well as discussing your conclusions. Do not
include excessive amounts of output in your reports. All the code should be copied
into the appendix and the sources should be packaged separately and submitted on the
blackboard in a zipped folder with the name:
"student_last_name.student_first_name.student_id.zip".
For example, suppose that the student name is John Smith and the student ID is
123456789. Then, the zipped file name will be John.Smith.123456789.zip.
1. [20 Marks] Consider the following example from Efron and Tibshirani (1993).
When a drug company introduces new medications, they are sometimes required
to show “bioequivalence”. Or, in other words, to demonstrate that the new drug is
not substantially different than the current treatment.
The table bellow shows eight subjects who used medical patches to infuse a certain
hormone into the blood. Each subject received three treatments: placebo, oldpatch, new-patch.
subject placebo old new old - placebo new - old
1 9243 17649 16449 8406 -1200
2 9671 12013 14614 2342 2601
3 11792 19979 17274 8187 -2705
4 13357 21816 23798 8459 1982
5 9055 13850 12560 4795 -1290
6 6290 9806 10157 3516 351
7 12412 17208 16570 4796 -638
8 18806 29044 26325 10238 -2719
Let
• Z = old − placebo, and
• Y = new − old.
1
The Food and Drug Administration (FDA) requirement for bioequivalence is that
|θ| 6 0.20, where: θ =
E[Y ]
E[Z]
.
Write a program that performs the following calculations; set the generator seed to
be 12345.
(a) Calculate the plug-in estimate of θ, which is equal to θb = Y /Z. [7 Marks]
(b) Using the bootstrap method with B = 1000 replications, calculate the 95%
confidence interval. Compare the obtained interval with the desired quantity:
|θ| 6 0.20. What is your conclusion? [13 Marks]
2. [80 Marks (see details below)] Regular and priority packages arrive to a post
office according to a Poisson process with rate λ = 4 (4 packages per unit time
on average). It was noted that a package is regular with probability 0.9. There
are 9 post-office workers and each package should be processed by a post-office
employee. The processing time is exponentially distributed with mean 2 (that is,
0.5 packages can be processed on average per unit time). If all workers are busy with
package processing, a newly arrived regular (or priority) package enters a regular
(or a priority) waiting queue. As soon as a worker finishes a package processing,
she either takes a new package from the waiting queues or remains idle if there are
no packages waiting. Provided that there are waiting packages in both the regular
and the priority queues, the worker will fetch a package from a priority queue.
Perform a Discrete-Event Simulation study in Matlab, Python, or R, to answer the
following question.
Assuming that the queues are empty at time 0, we would like to find the proportion of time that all workers are busy during the day from t = 0 to T = 2000.
That is, we would like to simulate the process for 2000 unit times. Run the
simulation and report the proportion along with 95% confidence interval.
Perform a Discrete-Event Simulation study to answer the following question.
(a) [10 Marks] Give the problem summary and describe the project objective.
(b) [20 Marks] Give a specification of variables used in the simulation study. In
addition, show a diagram that describes the project dynamics.
(c) [25 Marks] Results and Analysis. Using tables and figures, present a clear
outcome of your study. Present the corresponding confidence intervals.
(d) [15 Marks] Formulate your conclusions.
(e) [10 Marks] Appendix. Include all code files used. Explain their interaction
and provide a clear and well-commented code.