STAT 612 (Fall 2024)
Homework Assignment 6
Due: Dec. 1, by 11:59 p.m.
Instructions:
1. This problem set consists of 5 problems worth a total of 100 = 10 + 20 + 25 + 20 + 25 points.
2. All submissions must be made online via Canvas as a single pdf file with your name showing on the first page. Your solutions may include typed pages and/or scanned handwritten pages and/or R codes/outputs (if applicable). But they must be all combined into a single pdf file.
3. It is your responsibility to ensure that your uploaded solutions are complete and fully legible, especially if there are scans involved. If not, the TA may be forced to ignore those parts!
4. The deadline is strict. No unwarranted exceptions or extension requests will be entertained.
5. Your proofs/arguments must be rigorous and complete. Please make sure to show all details of your work, including intermediate steps/reasonings – otherwise points will be deducted.
6. You are only allowed to use (without proof ) any result from the lecture notes in your solutions – make sure to properly cite them (e.g., slide/result number) when using them in your proofs.
Problems:
1. Let x1 ∈ R
n be such that x
01Jn = 0, where Jn = (1, . . . , 1)0 ∈ R
n
, and let w ∈ R
n be such that k wk 2 = 1, w0 Jn = 0 and w0 x1 = 0. Set x2 = x1 + aw for a scalar a ∈ R. Suppose
Yn×1 = β0Jn +β1x1 +β2x2 + with ∼ N(0, σ2
In), and let (βb0, βb1, βb2)
0 be the OLS est.
(a) Derive formulae for Var(βbj ) (j = 0, 1, 2) and Var(βb1 + βb2), in terms of n, k x1k
2 and a.
(b) Derive formulae for the t-statistic(s) for testing H0 : β1 = 0 and H0 : β2 = 0. In both cases, investigate and justify the behavior. of these t-statistic(s) as a → 0.
2. Suppose E(Yn×1) = Xn×kβk×1, with rank(X) = r ≤ k. Let βb be any least squares estimator of β and consider a vector of LPFs (Aβ)q×1, where Aq×k = Cq×nXn×k for an arbitrary Cq×n.
(a) Show that Aβ is estimable, and Aβb is a LUE of Aβ for any βb as above, and that Aβb is invariant to the choice of the least squares estimator βb (recall there are many such).
(b) Suppose Var(Y) = σ
2
In. Let (ΣA)q×q := Var(Aβb), and let G be any g-inverse of X0 X. Show that ΣA = σ
2AGA0 = σ
2A(X0 X)
+A0 , and ΣA depends on X only through PX.
(c) Let Y ∼ N(Xβ, σ2
In). Then, with (r, A, βb, ΣA) as above and S
2 as usual, show that:
(Aβb − Aβ)
0 Σ
+
A(Aβb − Aβ)/(rAS
2
) ∼ FrA, n−r(0), where rA := rank(ΣA) ≤ q.
[Note: This result is useful for inference on Aβ via hypothesis tests or confidence sets.]
3. In a study of how age (X) affects blood pressure (Y ), a researcher realizes he needs to adjust for gender as it may have a differential impact. He therefore collects two datasets on (Y, X): S1 = (Y1, x1)n1×2 from females and S2 = (Y2, x2)n2×2 from males, with S1 ⊥⊥ S2. Suppose:
Yj ∼ N(αjJnj + βjxj
, σ2
Inj
) for some αj
, βj ∈ R and σ
2 > 0 (j = 1, 2). (1)
(a) Consider now the pooled sample Sp := S1∪S2 of size n := n1+n2. Show that the pooled response Yn×1 := (Y01
, Y02
)
0 is jointly Normal and satisfies a linear model as follows:
Y ∼ N(Xβ, σ2
In), with Xn×4 =
Jn
, β = (α1, α2, β1, β2)
0 . (2)
(b) Let {Rj
2
, Tj
, Rj,
2
adj} (j = 1, 2) and {Rp
2
, Tp, Rp,
2
adj} be the {R2
, TSS, Radj
2 } values for the j
th model in (1) and the pooled model (2), resp. Show that although Jn is not a column of X in model (2), R2
p
is still well-defined, and then show that the triplets above satisfy:
(1 − Rp
2
)Tp = 2X j=1 (1 − Rj
2
)Tj and (1 − Rp,
2
adj)Tp =
n
n
−
−
1
4
X
2 j=1 (1 − Rj,
2
adj)Tj
.
(c) The researcher wants to test the following two hypotheses (stated in words on purpose):
(i) H0 : Gender does not impact the effect of age on the mean blood pressure.
(ii) H0 : Age has no effect on the mean blood pressure regardless of gender group.
Formulate each hypothesis in terms of the model parameters in (2) and write them in the form. H0 : E(Y) ∈ V0 for appropriate V0 ⊆ C(X). Construct F-tests thereafter for testing these (showing clearly your test statistic, its null distribution and the P-value).
4. Let x1 = (1, 1, 1, 1, 1, 1)0 , x2 = (3, −1, 4, 6, 3, 3)0 , x3 = (7, 3, 2, 0, 3, 3)0 , x4 = (8, 4, 9, −5, 4, 4)0 ; let X6×4 = (x1, . . . , x4) and Y6×1 = (4, 36, 44, 12, 16, 8)0 ; and let β = (β1, . . . , β4)
0 . Consider testing H0 : β4 = 0 and β2 = β3, in the model: Y = Xβ + with ∼ N(0, σ2
I6) (σ
2 > 0).
(a) Show that all LPFs are estimable (and thus H0 above is trivially a testable hypothesis). Specify a full-row rank matrix A such that H0 as above is equivalent to testing Aβ = 0.
(b) For the full model, find the least squares estimates βb, Yb := p{Y|C(X)} and Z := Aβb.
(c) Let V0 := {Xb | Ab = 0}. Then, find Yb0 := p(Y|V0), Yb1 := p{Y|C(X) ∩ V0
⊥} and b
e := p{Y|C(X)
⊥}. Verify numerically that these three vectors are orthogonal.
(d) Calculate the error sums of squares for the full model (ESS1) and under H0 (ESS0). Verify numerically that: ESS0 − ESS1 = k Yb − Yb0k
2 = k Yb1k
2 = Z
0 {A(X0 X)
−1A0 }
−1Z.
(e) Compute the F-statistic for testing H0 and perform. the test (including the P-value).
(f) Give the least squares estimate of β under H0 (i.e., subject to the constraints Aβ = 0).
5. Consider the linear model: E(Yn×1) = Xn×kβk×1 and Var(Y) = σ
2
In, with rank(X) = k, for some unknown β ∈ R
k and σ
2 > 0. Let βb denote the usual OLS estimator of β.
Let Aq×k be any matrix with rank(A) = q. Consider the constrained least squares estimator:
βb0 := arg min b∈Rk: Ab = 0 k
Y − Xbk 2
, and let V0 := {Xn×kbk×1 | Aq×kbk×1 = 0q} ⊆ R
n
.
(a) Show that βb0 = (X0 X)
−1X0 PV0Y, and that it is the unique such minimizer as above.
(b) Show that βb0 is unbiased for β if and only if Aβ = 0.
(c) Show that regardless of whether Aβ = 0 or not, Var(c
0 βb0) ≤ Var(c
0 βb) ∀c ∈ R
k
, and further if Aβ = 0, show that: E(k βb0 − βk
2
) < E(k βb − βk
2
) [with strict inequality].