#
代写Stat 462/862作业、代做Python编程设计作业、代写Java/c++实验作业
代写留学生 Statistics统计、回归、迭代|帮做Java程序

Stat 462/862 Assignment 4

(Due in my mailbox at Jeffery Hall 406 on Dec 5h, 2019)

1. This problem involves the OJ data set which is part of the ISLR package.

(a) Create a training set containing a random sample of 800 observations, and a test

set containing the remaining observations.

(b) Fit a tree to the training data, with P urchase as the response and the other variables

as predictors. Use the summary() function to produce summary statistics

about the tree, and describe the results obtained. What is the training error rate?

How many terminal nodes does the tree have?

(c) Create a plot of the tree, and interpret the results.

(d) Predict the response on the test data, and produce a confusion matrix comparing

the test labels to the predicted test labels. What is the test error rate?

(e) Apply the cv.tree() function to the training set in order to determine the optimal

tree size.

(f) Produce a plot with tree size on the x-axis and cross-validated classification error

rate on the y-axis.

(g) Which tree size corresponds to the lowest cross-validated classification error rate?

(h) Produce a pruned tree corresponding to the optimal tree size obtained using crossvalidation.

If cross-validation does not lead to selection of a pruned tree, then

create a pruned tree with five terminal nodes.

(i) Compare the training error rates between the pruned and unpruned trees. Which

is higher?

(j) Compare the test error rates between the pruned and unpruned trees. Which is

higher?

2. Consider the problem of generating sample from a Beta distribution Be(α, β).

(a) One result is, if two Gamma random variables are X1 ∼ Ga(α, 1) and X2 ∼

Ga(β, 1), then X =X1X1 + X2∼ Be(α, β).1

Use this result to construct an algorithm to generate a Beta random sample.

Provide a density histogram to emulate the performance.

(b) Compare the algorithm in (a) with the rejection method based on (i) the uniform

distribution; (ii) the truncated normal distribution.

3. Consider estimating the integral

θ =Z ∞0exp(−(√x + 0.5x))sin2

(x)dx

where the pdf of x is f(x) = 0.5 exp(−0.5x).

(a) Conduct the Monte Carlo (MC) integration for estimating θ.

(b) Conduct MC integration using importance sampling with the following proposal

functions

g1(x) = 1

2

exp(−|x|),(Laplace Distribution)

g2(x) = 12π11 + x2/4,

g3(x) = 1√2πexp(−x2/2).

For sample size M = 100, 500, 1000, 2000, compare the mean and standard deviations

of the estimates.

(c) (For graduate students 862 only) Implement MC integration using self-normalized

importance sampling with g(x) from a mixture normal density. Explain the procedure

and integrate your results clearly.

4. (a) Provide a Metropolis-Hastings algorithm to generate samples from a binomial

distribution Bino(n, p) with

P(X = k) =nk!pk(1 − p)n−k, k = 0, . . . , n.

Use uniform distribution in {0, . . . , n} as proposal distribution and use independent

chains. Compare estimated means and variances with the known theoretical

means and variances of the binomial distribution.

2

(b) Provide a Metropolis-Hastings algorithm to generate samples from a standard normal

distribution. The proposal distribution is the normal distribution with the

mean being the current value in the chain and the variance being 0.25,0.01,100,

respectively. Compare the estimated means and variance with the known theoretical

means and variance of a standard normal distribution.

3

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

联系我们 - QQ: 99515681 微信：codinghelp2

程序代写网！