首页 >
> 详细

Homework 3

CS534 Machine Learning, Fall 2021

This homework will explore validation and model selection procedures

using regularized logistic regression models and nested cross validation.

Problem 1 - Cross validation (60 points)

This elastic-net regularized logistic regression model is derived by minimiz-

ing the negative log likelihood function for samples (xi, gi), gi ∈ {1, 2}

max

β,β0

`(β, β0) = max

β,β0

1

N

N∑

i=1

{I(gi?1)log p(xi)+I(gi?2)log (1?p(xi))}?λPα(β),

where the regularization penalty Pα(β) = (1?α)12‖β‖22+α‖β‖1 is a function

of the free parameters λ, α, and where the indicator function I(0) = 1 and

zero elsewhere. The weight λ determines the magnitude of regularization,

and the mixing parameter α determines the proportion of the penalty allo-

cated to the 1 and 2 norms. The mixing parameter α is often not chosen

through cross validation but just set using intuition or experience, where

the weight λ is typically determined through cross validation.

In this problem you will use a built-in function for elastic-net logistic

regression. The mixing parameter will be fixed α = 0.95 and you will search

for the optimal regularization weight lambda in the range λ ∈ [0, 100]. These

are often spaced on a logarithmic scale at 100 or more intervals.

Perform a nested cross validation using five folds to determine the op-

timal regularization weight and report test error. In each step, 4/5 of the

data will be used for training and validation, and 1/5 will be used to report

test error.

1.a. Validation diagram (20 points) Draw a diagram of the cross valida-

tion where you indicate which samples are in training, testing, and validation

in each of the 5 nested folds. Depict both the inner and outer loops of the

nested CV. Indicate clearly how the data is segmented, and the fraction of

the data used in each segment.

1

1.b. Model selection (20 points) For each outer CV fold, generate a plot

of the classification error on the validation set as a function of the sequence

of λ values (five plots total). Use the inner CV folds to calculate standard

deviations of the error σE(λ) at each λ, as well as the mean error μE(λ).

Choose the optimal lambda λ? as the largest λ within 1-standard devi-

ation of the minimum average error

λmin = argmin

λ

μE(λ),

λ? = max{λ} subject to λ ≤ μE(λmin) + σE(λmin)

In each plot, indicate μE(λ) with a solid black line, and the intervals of

variance μE(λ) ± σE(λ) in red, μE(λmin) as a blue point and μE(λ?) as a

green point.

1.c. Test error (20 points) Generate a box plot of the five test errors

generated by cross-validation. Display the validation errors μE(λ?) in two

additional boxplots in the same graph (there are 20 training errors and 20

validation errors). Compare and discuss the errors.

2

Problem 2 - Elastic net logistic regression (40 points)

Implement the elastic net regression algorithm using the soft-thresholding

and iterative reweighted least-squares approach described in Friedman 2010.

This problem can be solved by optimizing the penalized negative log likeli-

hood

min`(β, β0) =

1

N

N∑

i=1

{I(gi? 1)log p(xi) + I(gi? 2)log (1? p(xi))}?λPα(β),

where the indicator function I(0) = 1 and zero elsewhere, and p(xi) repre-

sents the class probability

Pr(G = 1|X = xi) = p(xi) = 1

1 + e?(β0+xTi β)

.

A quadratic approximation of this likelihood `Q can be used to transform

this problem into a weighted least-squares problem with weights wi and

response zi (see Equation 10 in Friedman paper)

`Q(β, β0) = ? 1

2N

N∑

i=1

wi(zi ? β0 ? xTi β)2.

where

zi = β?0 + x

T

i β? +

yi ? p?(xi)

wi

,

wi = p?(xi)(1? p?(xi)).

2.a. Solution path

Implement this algorithm using the mixing parameter α = 0.95 and plot the

solution path β(λ) for a sequence of 1000 logarithmically spaced λ in the

range λ ∈ [1, 100]. Label each feature j using text on the plot at the point

where βj enters the model (when this model coefficient becomes non-zero).

3

Notes

Understanding the approach at a high-level is important. You are

going to solve the quadratic approximation repeatedly using equation

10 from the paper, and at each iteration you will recalculate zi, wi to

update the quadratic approximation `Q.

You will generate a solution for each value of λ. A fixed number of

iterations is the easiest approach for deciding when each problem is

solved (50 worked for me).

To accelerate convergence you can use a warm-starting technique.

Start with the largest λ which will have the fewest non-zero coeffi-

cients. Use the solution of this regularization level as the start for the

next smallest lambda, etc.

This problem has some numerical issues that need to be addressed.

The weights, wi can shrink to zero and cause NaN values where used

in division. The parameters β will continue to grow in an attempt to

push the probabilities pi to zero or one. For this reason, you can clamp

these to 0, 1 once they are within a reasonable distance (say 1e ? 5).

The solutions may also become unstable when the regularization gets

too small (this will be apparent in the solution path plot).

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-21:00
- 微信：codinghelp

- 代写 Lab 2: Threads 2022-05-10
- 辅导assessment 1. Present Your Client ... 2022-05-10
- 5Cce2sas辅导、Python，Java程序辅导 2022-05-10
- 代写brae Webb编程 2022-05-09
- 辅导csci 3110 Assignment 1 2022-05-09
- Mth2222 Assignment 2代写 2022-05-09
- Cse3bdc Assignment 2022辅导 2022-05-08
- 辅导cis 468、辅导java，Python编程 2022-05-08
- Comp Sci 4094/4194/7094 Assignment 3 D... 2022-05-07
- Cs 178: Machine Learning & Data Mining... 2022-05-07
- Data7703 Assignment 4 2022-05-07
- 讲解assignment 2: Databases 2022-04-25
- 辅导ait681 Static Analysis 2022-04-25
- Cse121 & Cse121l 编程辅导、辅导c++程序语言 2022-04-25
- 辅导iti1120 Bject-Oriented Programming 2022-04-25
- Cmt304语言辅导、辅导c++，Python编程 2022-04-25
- 辅导comp/Engn4528 Computer Vision 2022-04-24
- 辅导fin 2200 Bloomberg Investment Proj... 2022-04-24
- 辅导bism 7255 Uml Assignment 2022-04-23
- 讲解comp202 Programming Assignment 2022-04-23