Assignment 2
Due Monday, October 24th
You are required to submit two files through the course website: your answer file (as a PDF) and your coding script (as either a .R file). Naming conventions are as follows:
PDF File: “Last Name, First Name – Assignment 2.pdf”
R File: “Last Name, First Name – Assignment 2.R”
1.For each of the following scenarios, identify the target population, the sampling frame and the sample used. Identify any potential weaknesses in this sample. For the variables identified, determine if they are continuous or discrete. For continuous variables define the hard and soft boundaries, and for discrete variables define the categories, and whether it is ordinal or categorical. (20 points)
(a)A health region was concerned about the accuracy of their clinical documentation, so they have finally implemented a district wide upgrade to their electronic health records, and are interested in the effect of the new system. They select 1000 random employees from the health region (using the human resources payroll information) and ask them a series of questions about their opinions about the system, along with demographic information about age, gender, location, years working in the district and professional title.
(b)The city of Halifax is funding the installation of several thousand solar-powered hot-water heaters in the city. To select homes for installation they asked for volunteers, who had to provide information about the amount of water they used, the type of hot water heating system they have, number of people living in the home, total household income and location in the city. They city is now going to use this survey to provide report on the overall water usage amongst Halifax residents.
2.The following table presents the results of a treatment to prevent cancer in patients diagnosed with HPV (20 points)
Cancer No Cancer Total
Treatment Yes 15 185 200
No 35 165 200
TOTAL 50 350 400
(a)Provide the null and alternative for evaluating this treatment (5)
(b)Calculate the Odds Ratio (OR) for this problem (5)
(c)Calculate the confidence interval for the OR (5)
(d)Interpret the OR and the CI, and evaluate the hypotheses from part (a). (5)
3.The following table presents the effect of two drugs that are supposed to prevent patients contracting the flu. (20 points)
Flu No Flu Total
Drug A 28 72 100
B 42 58 100
TOTAL 70 130 200
(a)Calculate the RD and interpret it (5)
(b)Calculate the NNT and interpret it (5)
(c)Calculate the RR and interpret it (5)
(d)Suppose the chi-square statistic for this test is 4.31, is the drug effective? (5)
4.A newspaper provides the following report about an upcoming Election. (20 points)
Polls indicate that Candidate Johnson is the pick of 54% of the population, leaving Candidate McQuaid with 46%. The margin of error on this survey is 4.9% points, 19 times out of 20.
Provide the confidence interval (with the confidence level) for Candidate Johnson's popularity. Do the polls provide conclusive evidence that Candidate Johnson will be elected?
5.A study is being run at the registrar’s office to determine how much time is needed for students to write their exams, in particular whether BA and BSc students use different amounts of time. Based on previous data they believe that students take, on average, about 145 minutes to write their exams (sd of 35). They believe that 10 minutes is a meaningful difference between the groups, so they decide to sample 100 BA and 100 BSc Students. (20 points)
(a)Calculate the power of this study. (4)
(b)What would be a sufficient sample size? Assume a power of 80%. (4)
(c)If the sample size is fixed, what would you change about the study to increase the power? Be specific. (4)
(d)The study has the following results:
Group Average SD
BA 147 50
BSc 128 55
Calculate a confidence interval for the difference in exam times between groups (8)