辅导 MA 575 – Fall 2022. Midterm Exam讲解 Java程序

MA 575 – Fall 2022. Midterm Exam

Some useful formulas

• The Gaussian distribution If X ∼ N(µ, σ2), we have E(X) = µ, Var(X) = σ2.

• If X = (X1, . . . , Xp)′ ∼ Np(µ, Σ), and A ∈ R k×p , for some k ≥ 1, then AX ∼ N(Aµ, AΣA′).

• In a simple linear regression model yi = β0 + β1xi + ϵi , 1 ≤ i ≤ n, where the errors are independent, and have mean zero and variance σ2, the estimates of β0, β1 and σ2 are given respectively by

where ˆyi = ˆβ0 + ˆβ1xi , and where as usual denote the true values of the parameters, we have

Furthermore the R2 of the model is

Problem 1: Consider the multiple linear regression model y = Xβ + ϵ, where ϵ is modeled as having the distribution N(0, σ2 In). Suppose that we fit the model to a dataset and obtain the following summary in R.

Call:

lm(formula = y ˜ X1 + X2 + X3 + X4, data = dataset)

Residuals:

Min 1Q Median 3Q Max

-90.531 -20.855 -1.746 15.979 66.571

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 1045.9715 52.8698 19.784 < 2e-16 ***

X1 4.4626 10.5465 0.423 0.674

X2 1.6379 2.3872 0.686 0.496

X3 -3.6242 3.2154 -1.127 0.266

X4 -2.9045 0.2313 -12.559 2.61e-16 ***

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 32.7 on 52 degrees of freedom

Multiple R-squared: 0.8246,Adjusted R-squared: 0.809

F-statistic: 52.88 on 4 and 52 DF, p-value: < 2.2e-16

(a) (2pts) What does the column ’Estimate’ represent? How is it computed?

(b) (2pts) What does the column ’Std. Error’ represent? How is it computed?

(d) (2pts) What does the column P r(|t| >) represent? How is it computed?

(e) (2pts) What does the F-statistic represent? How is it computed?

Problem 2: Consider a simple linear regression model yi = β0 +β1xi +ϵi , for i = 1, . . . , n, where the error terms ϵi’s are assumed independent and identically distributed with distribution N(0, σ2 ). Let ˆβ0, ˆβ1 denote the least squares estimators of β0 and β1 respectively, and let σˆ 2 be the usual linear model estimator of σ2. See formulas on page 1. Answer TRUE or FALSE, and explain your choice if asked.

(a) (2pts) The fitted regression line x 7→ ˆβ0 + ˆβ1x always goes through the point (¯x, ¯y). Explain.

(b) (2pts) The estimators (ˆβ0, ˆβ1) and ˆσ2 are always independent.

(d) (2pts) The T distribution with ν degree of freedom is obtained by dividing a standard normal random variable by the square root of a random variable that follows a chi-square distribution with ν degree of freedom.

(e) (2pts) The R2 of the model equals to one means that the linear model is the best possible regression model for the data.

Problem 3: Consider the simple linear regression model yi = βxi + ϵi , 1 ≤ i ≤ n, where the error terms ϵi’s are assumed iid with distribution N(0, σ2). The true values of β, σ2 are denoted respectively β⋆, and σ⋆2.

(a) (3pts) Suppose that we combine all the n observations together to write the model in a matrix form. as y = Xβ + ϵ, where y = (y1, . . . , yn)′. What is X, β and ϵ? Deduce the expression of the least squares estimate ˆβ of β.