Assignment 3

Instructions: • Answer all three (3) questions of this assignment. Show all your work

• Compile your solutions using LATEX or Rmarkdown or Word as your base. Submit your assignment

parts into Crowdmark. Crowdmark accepts PDF, JPG, and PNG files. Crowdmark will allow for

group submission for Part 1 only.

• Presentation of solutions is important. Assignments should be word-processed and presently neatly.

Use proper statistical terminology and proper English language. Supporting output, such as unrequested R codes and extraneous output are optional and will not be graded. However, if you choose

to include these, please place in a separate appendix at the end of your assignment.

Grading: The grand total is 45 marks which includes 6 marks for excellent presentation. Best answers

will receive the best marks. A general grading rubric is given below.

Per Question Part

• 3 points: Complete, correct and clearly written

answers. Answers model individual preparation and academic honesty (where applicable).

• 2 points: Good answers that are unclear, contain few mistakes or missing components. Answers demonstrate some individual preparation and some academic honesty (where applicable).

• 1 points: Poor answers or many missing components. Most answers do not demonstrate

individual preparation or academic honesty

(where applicable).

• 0 points: Missing or incomprehensible answers.

Answers are not academically integral.

Presentation

• 3 points: well presented, easy to read, proper

English used, R code shown only where required.

• 2 points: good presentation, some unnecessary

R codes and unformatted output

• 1 point: poor presentation, handwritten, handdrawn diagrams, unnecessary R codes and unformatted output

• 0 point: illegible, missing, unclear presentation

1

1. (Adapted from Scheaffer et al.) A manufacturer of band saws wants to estimate the average repair

cost per month for the saws he has sold to certain industries. He cannot obtain a repair cost for

each saw, but he can obtain the total amount spent for saw repairs and the number of saws owned

by each industry. Thus, he decides to use cluster sampling, with each industry as a cluster. The

manufacturer selects a simple random sample of n = 20 from N = 96 industries he services. The

data on total cost of repairs per industry and number of saws per industry are as given in the table

below.

Industry Number of saws Total repair cost for past month ($)

1 3 50

2 7 110

3 11 230

4 9 140

5 2 60

6 12 280

7 14 240

8 3 45

9 5 60

10 9 230

11 8 140

12 6 130

13 3 70

14 2 50

15 1 10

16 4 60

17 12 280

18 6 150

19 5 110

20 8 120

(a) [3 marks] Estimate the average repair cost per saw for the past month and place a bound on

the error of estimation.

(b) [3 marks] To estimate the average repair cost per saw for the past month, how many clusters

(n) should the manufacturer select for his sample if he wants the bound (B) on the error of

estimation to be less than $2?

(c) [3 marks] Compare your results of part(a) with part(b), and comment on the relationship between n and B.

2. (Adapted from Scheaffer et al.) A market research firm constructed a sampling plan to estimate the

weekly sales of brand A cereal in a certain geographic area. The firm decided to sample cities within

the area and then to sample supermarkets within cities. The number of boxes of brand A cereal sold

in a specified week is the measurement of interest. Five cities are sampled from the 20 in the area.

Using the data given in the accompanying table, answer the following:

(a) [3 marks] Estimate the average sales for the week for all supermarkets in the area. Place a

bound on the error of the estimation. Is the estimator you used unbiased?

(b) [3 marks] Do you have enough information to estimate the total number of boxes of cereal sold

by all supermarkets in the area during the week? If so, explain how you would estimate this

total, and place a bound on the error of estimation.

2

Number of Supermarkets

City supermarkets sampled y¯i s2i 1 45 9 102 20

2 36 7 90 16

3 20 4 76 22

4 18 4 94 26

5 28 6 120 12

3. Use the population data set, hhw21.csv, with N = 210 pairs of measurements of handspan, x and

height, y from our class to mainly compare regression and ratio estimation for estimating the population mean height µy, using information from a sample of size n =10. Set the seed of your

randomization to be the digits of your student number.

(a) [3 marks] Obtain a simple random sample of the data and display it in a table. Include the

‘id’ numbers in your table. Show your R codes used to obtain your answers to this part and use

your sample obtained here to answer the remaining parts of this question.

(b) [3 marks] Using an SRS estimator, estimate µy and place a bound on the error of estimation.

(c) [3 marks] Using a ratio estimator, estimate µy and place a bound on the error of estimation.

(d) [3 marks] Using a regression estimator, estimate µy and place a bound on the error of estimation.

(e) [3 marks] Using a difference estimator, estimate µy and place a bound on the error of estimation.

(f) [3 marks] Find the error of estimation, |µˆ ^ µy| for each of the four estimators in parts (b) to

(e) and compare them.

(g) [3 marks] Which of the three estimators of parts (c) to (e) would you recommend? Explain.

(h) [3 marks] Do you recommend the SRS estimator over the other three estimators? Explain.

