首页 > > 详细

Homework # 7 Sauer

 Homework # 7

1. Sauer discusses condition numbers in Section 2.3.1 and 4.1.3.
2. Let f(x) be a function from R
n
to R. Suppose we would like to
maximize f(x). Show that if Hf(x) is negative definite then
the Newton’s method direction at x, ,[Hf(x)]䛈1∇f(x), is an
ascent direction. What does this imply for Newton’s method
with backtracking? (We did this in class, except that we con￾sidered the minimization case and Hf(x) as positive definite;
here I want you to go through the argument yourself for this
slightly altered case.)
3. Consider the log-likelihood for the single covariate (i.e. each
xi ∈ R) logistic regression:
log L(α) = X
n rewrite g(x) in the more compact
form: g(α) = log(1 + exp((α · x˜i)). (This compact form
makes it easier to take derivatives.
(b) Show the following
(You computed the gradient and Hessian of g(α) in previ￾ous hws, but here I want you to see the form above so the
next subproblem is easier.)
T x)). Use this fact to show that the function g(x)
is convex.
(d) Show the following facts. You can prove them from the
definition or just explain the intuition through a graph.
i. If two functions f(x) and h(x) are convex then so is
their sum f(x) + g(x).
ii. If a function f(x) is convex, then nf(x) is concave.
Then, show that log L(α) is a concave function.
(e) Generate a plot of log L(α) over some line in R
that con￾tains the maximum of log L(α) (you computed this point
in a previous hw.). Explain why the graph you produce is
concave.
4. The MNIST dataset is a popular dataset for practicing machine
learning algorithms. Read about the dataset here
https://en.wikipedia.org/wiki/MNIST_database
Attached you will find two files. mnist_train.csv, mnist_test.csv.
Each file contains a matrix. Each row of the matrix corresponds
to an image of a hand written digit. The first entry in the row
is the digit in the image (i.e. 7 if the digit image is a seven), the
rest of the values, of which there are 784 (from a 28 × 28 pixel
image) are the pixel values. See the script mnist_intro.R for
an example.
In this problem, you will build a classifier that identifies when
a hand written digit equals 3. To build the classifier you will
fit a logistic regression to the data. The response variable,
y ∈ {0, 1} will be 1 if the number is 3 and 0 otherwise. The
covariates, x ∈ R
784 are the pixel values. Set
α = (α0, α1, α2, . . . , α784) (5)
The logistic-regression model assumes
P(y = 1 | x, α) = 1
1 + exp[[α · x˜]
(6)
2
where ˜x is defined as in problem 2 (i.e. we just add a 1 to the
beginning of the x vector.)
(a) Show that the log-likelihood is given by
log L(α) = X
N
i=1
(11yi)((α· x˜i))log(1 + exp((α· x˜i)) (7)
(b) Set g(α) = log(1 + exp((α · x˜i)) and show that ∇g(α) and
Hg(α) have the same form given in problem 2
(c) Write a damped Newton’s method algorithm to compute
the optimal α by maximizing the log likelihood on the
training dataset. How do you know Newton’s method will
converge to a maximum? (NOTE: You may run into dif-
ficulties with non-invertible Hessians; we will address that
issue in coming classes. If that happens, try other starting
points.)
(d) A classifier is a function
C(x) : R
784 → {0, 1}. (8)
Once you compute an α in (c), you can build a classifier
as follows
C(x) =  1 if P(x | α) = 1
1+exp[[ α·x˜] ≥ p
0 otherwise.
(9)
where p is some cutoff probability. Above p, you call the
images as 3’s, below p you call images as not 3’s. Select
a p and test the accuracy of your classifier using the test
dataset.
 
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!