#
Data编程辅导、辅导R程序语言

Assignment 2

Machine Learning and Big Data for Economics and Finance

Exercise 1. In this exercise, all the cross-validation simulations should involve a random split

of the original sample into a training subsample corresponding to 90% of the observations and

a testing subsample corresponding to the remaining 10% of the observations. 1. Generate a sample of size n = 100 from the following model: X is a uniform random variable over the interval (¡1; 1). " is a normal random variable with mean 0 and standard deviation

1

2. " is

generated independently of X. Y is a random variable linked to X through the following equation: Y = 12X3 ¡ 5X2 ¡ 10X + "

From now on, delete " and keep the generated X and Y samples in an R data frame

object that you will call dp. We wil now consider a supervised learning setup where Y

is the output variable and X is the input variable. Fron now on, enote the observations

of the input variable by xi and those of the output variable by yi (for i = 1; :::; n). 2. Fit linear, quadratic and cubic regressions to the data in dp. Create three plots (for each model respectively) where each plot has xi on the X-axis and the true yi and predicted

y^i on the Y -axis. Compute the training and testing mean-squared errors for each model. 3. We are interested in constructing a step function learner as follows: First draw a random number U uniformly on the interval spanned by the minimum

and maximum values of the input values (x1; :::; xn) and then use it to construct the

following function whose purpose is to give the prediction of Y given X = x: f(x) = 1I(U 6 x) + 2I(U > x); where 1 and 2 are just unknown constants to be learned. It goes without saying that

I(some statement) is the indicator function that equals 1 when the statement is true

and 0 otherwise. Construct an R function called wst that implements an estimate f^(x) = ^1I(U 6

x) + ^2I(U > x) of f. This R function must take the following inputs: A vector x of values at which you would like to compute f^. A data frame data containing the input and output variables. An optional numerical input argument u that overrides the behavior of the

learner by forcing the cutpoint of f to be at u instead of the randomly gen- erated cutpoint U. The outputs of the function wst should be a list that contains the following: A vector fitted that contains the predictions at x. A vector coefficients that contains ^1 and ^2. A number cutpoint that is either equal to the input u provided by the function

user or equal to the randomly generated U if the input argument u is not pro- vided. 4. Assess the variability of both the testing and the training mean squared errors of f^ when evaluated at the data dp by drawing B = 1000 bootstrap samples. 1

Exercise 2. Describe mathematically what the following code does. Add comments to each

line describing what each line accomplishes. Design a scenario where this code would be useful

and write that scenario in R. (The scenario should not involve more than four lines of R code)

OpSp = function(y,x){

n = length(y)

s = n - 1

xs = sort(x)

ci = rep(0,s)

ssri = rep(0,s)

for (i in 1:s){

ci[i] = (xs[i] + xs[i+1])/2

y1 = y[xy2 = y[x>ci[i]] - mean( y[x>ci[i]] )

ssri[i] = sum( y1^2 ) + sum( y2^2 )

}

return(ci[which.min(ssri)])

}

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-21:00
- 微信：codinghelp

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！