Statistical Methods for Data Science
DATA7202
Semester 1, 2020
Assignment 4 (Weight: 30%)
Assignment 4 is due on June 27 2020, 2:00pm.
There are two questions below. For questions 1 and 2, you should present your analysis
of data using Python, Matlab, or R, as a short report, clearly answering the objectives
and justifying the modeling (and hence statistical analysis) choices you make, as well as
discussing your conclusions.
1. (20%) Consider the following example from Efron and Tibshirani (1993). When a
drug company introduces new medications, they are sometimes required to show
“bioequivalence”. Or, in other words, to demonstrate that the new drug is not
substantially different than the current treatment.
The table bellow shows eight subjects who used medical patches to infuse a certain
hormone into the blood. Each subject received three treatments: placebo, old-
patch, new-patch.
subject placebo old new old - placebo new - old
1 9243 17649 16449 8406 -1200
2 9671 12013 14614 2342 2601
3 11792 19979 17274 8187 -2705
4 13357 21816 23798 8459 1982
5 9055 13850 12560 4795 -1290
6 6290 9806 10157 3516 351
7 12412 17208 16570 4796 -638
8 18806 29044 26325 10238 -2719
Let
• Z = old− placebo, and
• Y = new − old.
The Food and Drug Administration (FDA) requirement for bioequivalence is that
|θ| 6 0.20, where: θ = E[Y ]E[Z] .
Write a program that performs the following calculations; set the generator seed to
be 12345.
(a) Calculate the plug-in estimate of θ, which is equal to θ̂ = Y /Z.
(b) Using the bootstrap method with B = 1000 replications, calculate the 95%
confidence interval. Compare the obtained interval with the desired quantity:
|θ| 6 0.20. What is your conclusion?
1
2. (80%) Air Secure wishes to open a number of new service desks, guaranteeing that
in the long run 90% of their customers do not have to wait longer than 8 minutes in
a waiting queue before they are served. Preliminary research by Air Secure showed
that on arrival customers always choose the smallest queue and remain there until
served. This research also investigated the passengers inter-arrival time (in minutes)
and the service time. The results are summarized in data.csv. The data for the
first four passengers are provided below.
inter_arrival_time service_time
2.1230325064814 3.83455057136373
0.304277254841897 3.07898542818172
0.162593146778897 3.87336623034977
0.183088166798198 8.55428148088529
Perform a Discrete-Event Simulation study in Matlab, Python, or R, to answer the
following question.
How many service desks should be minimally available to meet the service re-
quirements? Namely, how many service desks should be available such that
with probability 0.9, a customer do not have to wait longer than 8 minutes in
a waiting queue before they are served. Run the simulation for T = 3000 units
of time.
Perform a Discrete-Event Simulation study to answer the following question.
(a) Give the problem summary and describe the project objective.
(b) Give a specification of variables used in the simulation study. In addition,
show a diagram that describes the project dynamics.
(c) Results and Analysis. Using tables and figures, present a clear outcome of
your study. Present the corresponding confidence intervals.
(d) Formulate your conclusions.
(e) Appendix. Include all code files used. Explain their interaction and provide a
clear and well-commented code.