首页 >
> 详细

Assignment 3 - Part 2

Version 2.1 with some typos corrected. Use this version of the assignment.

STA238

Winter 2020

Suppose Joe owns a pizza shop. We know it’s the favourite shop of at least one STA238 student, so this

question is of serious practical importance. Joe is concerned with the number of customers he serves during

the lunch hour. To study this, one day his trusted assistant records the time between successive customers.

Let Xi be the number of minutes until the i

th customer enters Joe’s shop, measured from when the previous

customer entered.

We model X1, . . . , Xn as independent and identically distributed random variables from the Exponential(λ0)

distribution, which is sometimes an appropriate model for waiting times. (Note that we are using here

an alternative parametrization of the Exponential distribution, which might be different than what you’ve

seen before; throughout this assignment work with the Exponential density given below.) The parameter λ

represents the average number of minutes Joe has to wait until the next customer enters his shop, measured

from the time the previous customer entered. λ = 2 would be 2 minutes per customer, λ = 0.1 would be 10

customers per minute, and so on. Joe wants to estimate the true value of λ, λ0. The exponential density is

fλ(xi) = λ−1exp(−xi/λ).

1. Show that the Maximum Likelihood Estimator (MLE) for λ in this model is λb = X¯.

2. The waiting times data are posted on the assignment page on Quercus in the file assignment3-waiting.csv.

Consider the following code. Describe in words, in full detail, the bootstrap algorithm that is implemented

in the code. Indicate the distribution that is being estimated by the bootstrap

distribution.

set.seed(7886)

waiting <- readr::read_csv(

# Include the path to where you saved the data on your computer.

file = "assignment3-waiting.csv",

col_names = TRUE,

col_types = "n"

)

glimpse(waiting)

## Observations: 30

## Variables: 1

## $ waitingtime

n <- nrow(waiting)

B <- 1000 # Number of bootstrap samples to do

bootmle <- numeric(B)

for (b in 1:B) {

samp <- sample(waiting$waitingtime,n,replace = TRUE)

bootmle[b] <- mean(samp) - mean(waiting$waitingtime)

}

tibble(x = bootmle) %>%

ggplot(aes(x = x)) +

theme_minimal() +

geom_histogram(aes(y = ..density..),bins = 50,colour = "black",fill = "transparent")

3. Joe comes in, angry. His assistant recorded the times between successive customers, when really, what

he wanted was the number of customers that arrived each minute! Joe is furious at his hapless assistant.

Luckily his assistant is a statistics major, and knows that if X1, . . . , Xn are independent Exponential(λ)

waiting times in minutes, then the number of customers per minute Y1, . . . , Ym is an i.i.d. sample from a

Poissonλ−1

distribution. So the parameter representing the physical quantity of interest now is µ = λ−1.

µ is the average number of customers per minute. So µ = 1/2 would be one customer every two minutes and

µ = 2 would be one customer every 30 seconds, and so on.

Follow the steps below which will allow you to find an estimate of µ. First, you will need to convert the

observed waiting times into a dataset containing the count of the number of customers arriving each minute.

To do this, follow the following steps:

Step 1: Create a new dataframe with two new variables: totaltime, computed as the total time (continuous,

fractional number of minutes) that has passed when each successive customer enters the pizza

shop; and numberofminutes, which is the number of integer minutes that have passed (plus one) when

each customer enters the shop. You can use totaltime = cumsum(waitingtime), numberofminutes =

ceiling(totaltime), and remember you can create new variables in a dataframe with the mutate function

in the dplyr package.

Your new dataframe should look like this (the numbers should match!):

## # A tibble: 30 x 3

## waitingtime totaltime numberofminutes

## # ... with 20 more rows

Step 2: Now, add up the number of customers that arrived in each integer minute. You can

group_by(numberofminutes) and then count the number of customers using summarize(numberofcustomers

= n()). Your new dataframe should be called customers and should look like this (the numbers should

match!):

customers1

## # A tibble: 18 x 2

## numberofminutes numberofcustomers

Step 3: Finally, you have to add zeroes for minutes in which customers didn’t arrive. Here is the code to this.

Run it and make sure you have created the data frame below.

zeroes <- tibble(numberofminutes = 1:max(customers1$numberofminutes))

customers <- zeroes %>%

left_join(customers1,by = "numberofminutes") %>%

replace_na(list(numberofcustomers = 0))

customers

## # A tibble: 26 x 2

## numberofminutes numberofcustomers

##

## # ... with 16 more rows

The column numberofcustomers now represents independent realizations of random variables Y1, . . . , Ym

from the Poissonλ−10

distribution, where λ0 is the same λ0 from question 1.

4. (a) The density of a Poissonλ−1

random variable is

fλ(yj ) = λ−yj e−1/λyj !

Use this density to find the MLE for λ and the MLE for µ = λ−1. Hint: you might want to find the MLE for

µ first, and then apply the principle of invariance to find the MLE for λ.

(b) Find the MLE for µ = λ

−1 using the original sample of waiting times and the principle of invariance of

the MLE. Do you expect your answer to be exactly the same as your answer to part (a)? Why or why

not?

5. Use your answer from question 4 to create a similar plot to that given in question 2. That is, construct

a bootstrap estimate of the sampling distribution of λ−1 − λ−10

and plot a histogram of this distribution.

You might wish to rely heavily on the code given in question 2.

6. (a) For the random sample X1, . . . , Xn, it can be shown thatλˆ − λ0λˆ∼ N0,1n

Compute a 95% confidence interval for λ using this asymptotic distribution, using the observed values of the

random sample X1, . . . , Xn.

(b) For the random sample Y1, . . . , Ym, it can be shown that

µˆ − µ0√µˆ∼ N0,1m

Compute a 95% confidence interval for µ using this asymptotic distribution, using the observed values of the

random sample Y1, . . . , Ym.

(c) Compare your confidence intervals in part (a) part (b). Interpret both of them: that is, “we are 95%

confident that ________ is between ___ and ___.” Remember that µ0 = λ−10.

7. (a) Compute the likelihood ratio Λ based on the sample Y1, . . . , Ym and evaluate the claim that µ = 2,

that is, two customers enter the shop per minute. Is this claim supported by the data?

(b) Compute the likelihood ratio Λ based on the sample X1, . . . , Xn and evaluate the claim that λ = 1/2,

that is, one customer enters the shop every 30 seconds. Is this claim supported by the data?

8. Use a bootstrap procedure to estimate a p-value corresponding to the claim that µ = 2 based on

Y1, . . . , Ym. Is your value different than what you got in question 7 (a)? Say why you would expect

them to be the same OR explain why they are different. (You may wish to use the code from the

lectures for week 11 for this question.)

9. (a) Consider the observed values of the random sample of customers per minute, y1, . . . , ym. Suppose

we put a Gamma(a, b) prior distribution on µ = λ−1 with density

Derive the posterior distribution for µ|y1, . . . , ym. State the name of the distribution and give

expressions for its parameters.

(b) Compute a 95% posterior credible interval for µ. You may wish to use the pgamma function in R to

calculate cumulative probability from the Gamma distribution. Interpret your interval and compare it

to the confidence interval for µ that you obtained in question 6 (b). Comment on whether they are

different. Compare their interpretations, and state whether you would expect them to be the same and

why/why not. Use values of a = 2 and b = 1 for the prior parameters (this gives a prior with mean 2).

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Tsp课程作业代写、代做algorithms留学生作业、代做java，C/C 2020-06-23
- Kit107留学生作业代做、C++编程语言作业调试、Data课程作业代写、代 2020-06-23
- Sta302h1f作业代做、代写r课程设计作业、代写r编程语言作业、代做da 2020-06-22
- 代写seng 474作业、代做data Mining作业、Python，Ja 2020-06-22
- Cmpsci 187 Binary Search Trees 2020-06-21
- Comp226 Assignment 2: Strategy 2020-06-21
- Math 504 Homework 12 2020-06-21
- Math4007 Assessed Coursework 2 2020-06-21
- Optimization In Machine Learning Assig... 2020-06-21
- Homework 1 – Math 104B 2020-06-20
- Comp1000 Unix And C Programming 2020-06-20
- General Specifications Use Python In T... 2020-06-20
- Comp-206 Mini Assignment 6 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Aps 105 Lab 9: Search And Link 2020-06-20
- Mech 203 – End-Of-Semester Project 2020-06-20
- Ms980 Business Analytics 2020-06-20
- Cs952 Database And Web Systems Develop... 2020-06-20
- Homework 4 Using Data From The China H... 2020-06-20
- Assignment 1 Build A Shopping Cart 2020-06-20