首页 > > 详细

代写STA437H1S、辅导R编程语言

Assignment #2 STA437H1S/2005H1S
due Friday February 17, 2023
Instructions: Solutions to problems 1–3 are to be submitted on Quercus (PDF files only).
1. Andrews curves (conceived the University of Toronto’s own David Andrews) represent an
interesting approach to multivariate visualization. The idea is to represent each multivariate
observation (xi1, · · · , xip) (which is possibly normalized) by a sinusoidal function on [0, 1]:
gi(t) =
xi1√
2
+ xi2 sin(2pit) + xi3 cos(2pit) + xi4 sin(4pit) + xi5 cos(4pit) + · · ·
Observations that are similar will have similar Andrews curves while outlying observations
will often have curves that are distinctively different.
On Quercus, there is a file andrews.txt, which contains a function andrews that computes
Andrews curves for a data matrix whose columns are variables and rows are observations;
for example,
> source("andrews.txt") # read the function into R
> x <- cbind(rnorm(100),rnorm(100),rnorm(100),rnorm(100),rnorm(100))
> r <- andrews(x,scale=T) # scales columns to have mean 0 and variance 1
The file testdata.txt contains 100? k observations from a 10-variate normal distribution
and k outliers generated from another distribution (where k ≤ 15).
(a) Look at the data using Andrews curves. How many clear outliers do there seem to be?
(b) Using the information from the Andrews curves as well as pairwise scatterplots, principal
components etc, give an estimate of how many outliers are in the data.
2. (a) If {gi(t)} are the Andrews curves defined in question 1, show that
2
∫ 1
0
[gi(t)? gj(t)]2 dt =
p∑
k=1
(xik ? xjk)2.
(b) If xˉ =
1
n
n∑
i=1
xi, what is the Andrews curve of xˉ?
(c) Suppose that xk lies on a line between xi and xj, that is, xk = λxi + (1? λ)xj for some
0 < λ < 1. What can you say about the Andrews curve of xk relative to those of xi and xj?
3. In Assignment #1, you looked at two dimensional scatterplots of data on two species of
rock crabs; here, you will do a principal components analysis of these data.
As before, the data are in a file crabs.txt on Quercus; the columns of the file are species (B
or O), sex (M or F), index (1-50 within each species-sex combination), width of the frontal
lip (LP), the rear width of the shell (RW), length along the midline of the shell (CL), the
maximum width of the shell (CW), and the body depth (BD).
The data can be read into R using the following code:
> x <- scan("crabs.txt",skip=1,what=list("c","c",0,0,0,0,0,0))
> colour1 <- ifelse(x[[1]]=="B","blue","orange") # species colours
> colour2 <- ifelse(x[[2]]=="M","black","red") # sex colours
> sex <- x[[2]]
> FL <- x[[4]]
> RW <- x[[5]]
> CL <- x[[6]]
> CW <- x[[7]]
> BD <- x[[8]]
(a) Using the correlation matrix, do a principal component analysis of the 5 variables.
> r <- princomp(~FL+RW+CL+CW+BD,cor=T)
> summary(r,loadings=T)
Give an interpretation of the first two principal components based on their loadings.
(b) Look at pairwise scatterplots of the 5 principal components using colour1 to distinguish
the two species:
> pairs(r$scores,col=colour1)
Which pairs of principal components seem to separate the two species?
(c) Now look at pairwise scatterplots of the 5 principal components using colour2 to dis-
tinguish the two sexes:
> pairs(r$scores,col=colour2)
Which pairs of principal components seem to separate the two sexes?
(d) Suppose you are given the following measurements for the 5 variables: FL = 18.7,
RW = 15.0, CL = 35.0, CW = 40.3, BD = 16.6. What is your prediction of the species and
sex of this crab?

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!