首页 >
> 详细

MA308: Statistical Calculation and Software

Assignment 3 (Dec 24, 2019 - Jan 02, 2020)

3.1 For the “weightgain” dataset from HSAUR3 package, the data arise from an experi?ment to study the gain in weight of rats fed on four different diets, distinguished by

amount of protein (low and high) and by source of protein (beef and cereal). Ten

rats are randomized to each of the four treatments and the weight gain in grams

recorded. The question of interest is how diet affects weight gain.

(a) Summarize the main features of the data by calculating group means and stan?dard deviations, use the plotmeans() function in the gplots package to produce

an interaction plot of group means and their confidence intervals.

(b) Use interaction2wt() function in the HH package to produce a plot of both

main effects and two-way interactions for any factorial design of any order.

Explain whether there exists interaction between source and type.

(c) Carry out two-way factorial ANOVA analysis with and without interaction

terms respectively, explain the corresponding results.

(d) What are the assumptions that our data need to satisfy when we implement

one-way ANOVA? Now if we use one-way ANOVA to examine the difference of

weightgain between different source of protein, are these assumptions satisfied?

(e) Carry out the permutation test version of the two-way factorial ANOVA analysis

of weightgain～source*type with the lmPerm package, compare the result with

that in 3.1(c).

3.2 For the “planets” dataset from HSAUR3 package,

(a) Apply complete linkage and average linkage hierarchical clustering to the planets

data. Compare the results with the K-means (K=3) clustering results in the

lecture notes.

2

(b) Construct a three-dimensional drop-line scatterplot of the planets data in which

the points are labelled with a suitable cluster label, K-means (K=3) method

can be used for clustering.

(c) Write a R function to fit a parametric model based on two-component normal

mixture model for the eccen variable in the planet data. (Hint: refer to the

“Mixture distribution estimation” section in Chapter 6)

(d) In fact, package mclust offers high-level functionality for estimating mixture

models, apply Mclust to estimate normal mixture model for the eccen variable

in the planet data. Compare the result with that in 3.2(c).

(e) Implement principal component analysis on the planet data, find out the co?efficients for the first two principal components and the principal component

scores for each planet.

(f) Apply K-means (K=3) clustering to the first two principal components of the

planet data. Compare the clustering result with that based on the original

data mentioned in 3.2(a).

3.3 For the “Default” dataset from ISLR pacakge, we consider how to predict default for

any given value of balance and income. In particular, we will now compute estimates

for the standard errors of the income and balance logistic regression coefficients in

two different ways: (1) using the bootstrap, and (2) using the standard formula for

computing the standard errors in the glm() function. Do not forget to set a random

seed before beginning your analysis.

(a) Using the summary() and glm() functions, determine the estimated standard

errors for the coefficients associated with income and balance in a multiple

logistic regression model that uses both predictors.

(b) Write a function, boot.fn() , that takes as input the Default data set as well

as an index of the observations, and that outputs the coefficient estimates for

income and balance in the multiple logistic regression model.

(c) Use the boot() function together with your boot.fn() function to estimate the

standard errors of the logistic regression coefficients for income and balance.

3

(d) Comment on the estimated standard errors obtained using the glm() function

and using your bootstrap function.

3.4 For the “Default” dataset from ISLR pacakge, we consider how to predict default for

any given value of balance and income.

(a) Split the sample set into a training set (70%) and a validation set (30%). Fit a

multiple logistic regression model (default ～ balance + income) using only the

training observations. Obtain a prediction of default status for each individual

in the validation set by computing the posterior probability of default for that

individual, and classifying the individual to the default category if the posterior

probability is greater than 0.5. Compute the validation set error, which is the

fraction of the observations in the validation set that are misclassified.

[10 points]

(b) Apply Classical Decision Tree and Conditional Inference Tree on the Default

dataset. Use the plotcp() function to plot the cross-validated error against the

complexity parameter and choose the most appropriate tree size.

(c) Write down the algorithm for a random forest involves sampling cases and

variables to create a large number of decision trees. Implement random forest

algorighm based on traditional decision trees and conditional inference trees

respectively. Use the random forest models built to classify the validation

sample and compare the predictive accuracy of the two models.

(d) Fit a support vector machine classifier to the Default dataset. Use tune.svm()

function to choose a combination of gamma and cost which may lead to a more

effective model. Compare the sensitivity, specificity, positive predictive power

and negative predictive power of the svm, random forest and logistic regression

classifiers.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp

- 代写cs3014 Google Analytics Customer Rev 2020-01-21
- 代写cmpsc121 Structs代写留学生c/C++实验... 2020-01-21
- 代写mis6326 Data Management调试存储过程作业、数据库编 2020-01-21
- 代写msci 581作业、代做marketing Analytics作业、P 2020-01-20
- Software课程作业代做、代写java，C/C++程序设计作业、Pyth 2020-01-20
- Tcss 372作业代做、代写python，Java编程语言作业、代做c/C 2020-01-20
- Emergency Facilities作业代写、代写r编程设计作业、R课程 2020-01-18
- Cis 413/513作业代做、代写data Structures作业、Ja 2020-01-18
- 代写ia626留学生作业、Python程序设计作业调试、代做data课程作业 2020-01-18
- Mat00027i作业代写、Java程序语言作业调试、Mathematica 2020-01-17
- 代做kt Model作业、代写java，Python编程设计作业、代做c/C 2020-01-17
- Data Set课程作业代做、代写r程序语言作业、Ltcret留学生作业代做 2020-01-17
- 代写rstudio留学生作业、代做r编程设计作业、代写r课程设计作业代做数据 2020-01-17
- 代写cs2250 Delimiter Matching代做数据结... 2020-01-16
- 代写cs12b Edit Distance帮写java实验作业... 2020-01-16
- 代写mins325 Filereader And Filewriter代... 2020-01-16
- 代写cosi131 Tunnels帮写java实验作业 2020-01-16
- 代写inm312 Balancebit Software代写留学... 2020-01-16
- 代写cs61b Maze Solver代写java课程设计 2020-01-16
- Program留学生作业代做、C/C++编程语言作业代写、代做java，Py 2020-01-14