Tsinghua-Berkeley Shenzhen Institute
Learning from Data
Fall 2018
Problem Set 2
Issued: Monday 22nd October, 2018 Due: Monday 29th October, 2018
2.1. A data set consists of m data pairs (x(1);y(1));:::;(x(m);y(m)), where x2Rn is the
independent variable, andy2R is the dependent variable. Denote the design matrix
by X , [x(1);:::;x(m)]T, and let y , [y(1); ;y(m)]T. The least-squares method
then minimizes the square loss J( ) defined as
J( ) = 12ky X k22;
where 2Rn is the parameter to be estimated. To find the optimal , letrJ( ) = 0,
and we can get the normal equation:
XTX =XTy: (1)
When XTX is invertible, we have = XTX 1XTy.
Now, suppose XTX is singular. Does the solution of (1) still exist? Prove your
result, and explain its meaning in plain words.
2.2. A data set consists of m data pairs (x(1);y(1));:::;(x(m);y(m)), where x2Rn is the
independent variable, and y2f1;:::;kgis the dependent variable. The conditional
probability Pyjx(yjx)1 estimated by the softmax regression is
(a) Evaluaterbl‘.
The data set can be described by its empirical distribution ^Px;y(x;y) defined as
Problem Set 2 Learning from Data Page 2 of 2
(b) Suppose we have set the biases (b1;:::;bk) to their optimal values, prove that
Pyjx(ljx) ^Px(x);
where X =fx(i): i = 1;:::;mgis the set of all samples of x.
Hint: The optimality impliesrb1‘ =rb2‘ = =rbk‘ = 0.
2.3. The multivariate normal distribution can be written as
where and are the parameters. Show that the family of multivariate normal
distributions is an exponential family, and find the corresponding , b(y), T(y), and
a( ).
Hints: The parameters and T(y) are not limited to be vectors, but can also be
matrices. In this case, the Frobenius inner product can be used to define the inner
product between two matrices, which is represented as the trace of their products. The
properties of matrix trace might be useful.