讲解 MATH3514/MATH6119: MIDSEMESTER EXAM SEPTEMBER 2024辅导 Python语言

MATH3514/MATH6119: MIDSEMESTER EXAM

SEPTEMBER 2024

This is the take-home midsemester exam paper for MATH3514/6119. You may use assignment so- lutions, worksheet solutions, lecture notes, and any textbooks you can borrow from the library. You should complete it independently, no discussion with others is allowed. The use of internet search and ChatGPT is strictly prohibited. Any violation can result in a zero grade!

Problem 1. (10 points) Consider the set

Ω := {(x,y, z) ∈ R3 : x2 + y2 + z2 = 1, y ≥ 0, z ≥ 0}.

Determine Tx* Ω and LFD(x* ) for x* = (1, 0, 0) with detailed arguments.

Problem 2. (10 points) Consider the problem

minimize (x + 1)2 + (y + 1)2

subject to {

(i) (4 points) Prove that the KKT conditions must be satisfied at each local minimizer.

(ii) (6 points) Find all the points satisfying the KKT conditions and determine the global minimizer.

Problem 3. (16 point) Consider the quadratic programming

minimize xT Qx + cTx

subject to ai(T)x = bi , i ∈ E ,

ai(T)x ≥ bi , i ∈ I

where Q is an n × n symmetric matrix, E and I are finite sets of indices, E = {1, ··· , me } and I = {me + 1, ··· , m}, bi , i ∈ E ∪ I, are given numbers, and c ∈ Rn. Let x* be a KKT point and λ* a corresponding Lagrange multiplier vector. Define

(i) (9 points) Let {xk } be a sequence of feasible points such that xk → x* as k → ∞ and f(xk ) ≤ f(x* ) for all k. Set dk := (xk −x* )/∥xk −x* ∥ for each k and let dbe any cluster point of {dk }. Show that d ∈ G(x* ,λ* ). (Hint: use Lemma 56 and the KKT conditions.)

(ii) (7 points) If dT Qd > 0 for all directions d ∈ G(x* ,λ* ) \ {0}, show that x* is a strict local minimizer of (1). (Hint: use a contradiction argument.)

Problem 4. (64 points) Consider the quadratic minimization problem

where A is a d × d symmetric positive definite matrix and b ∈ @d. It is known that (2) has a unique minimizer given by x* := A −1b. Let

0 < m = λd ≤ λd−1 ≤ · · · ≤ λ2 ≤ λ 1 = L

be the eigenvalues of A counting by the multiplicities. The ratio κ := λ 1 /λd = L/m is called the condition number of A. In this question you are required to perform. the analysis for the gradient method and its various variants when they are applied to solve (2).

(i) (12 points) Consider the gradient method

xn+1 = xn — α▽f(xn )

(3) with an initial guess x0 and carry out its convergence analysis, when applied to solve (2), by completing the following items:

(a) (5 points) Show that

xn+1 — x* = (I — αA)(xn — x* )

and conclude that

Ⅱxn+1 — x* Ⅱ ≤ max{|1 — αm| , |1 — αL|}Ⅱxn — x* Ⅱ

(b) (5 points) Show that the function α → max{|1—αm| , |1—αL|} on (—∞ , ∞ ) achieves its minimum at

Ⅱxn+1 — x Ⅱ ≤ κ + 1 Ⅱxn — x Ⅱ

and hence

for all n = 0, 1, ··· .

(ii) (22 points) Next consider the following variant of the gradient method

xn+1 = xn — α▽f(xn ) + β(xn — xn−1) (4)

with initial guess x0 = x −1 . Complete the following items to derive the convergence speed of the method (4) when it is applied to solve (2) under the choices

(a) (3 points) Show that there holds

(xxn(n+)1——x* ) = T (xn(x)1——xx(*)* ) (6)

for all n = 0, 1, ··· , where T is a 2d × 2d matrix given by

T := ((1 + β — αA —0(βI)) ,

where I denotes the d × d identity matrix.

(b) (3 points) Use (6) to conclude

, (xn(x)1——xx(*)* ), ≤ ⅡTn Ⅱ , (x0(x0) —— x(x)*(*)) ,,,, (7)

for all n = 0, 1, ··· . Use the Gelfand’s formula

ρ(T) = lim ⅡTn Ⅱ1/n , n→∞

which can be taken for granted, to further obtain

" (xn(x)1——xx(*)* ) " ≤ (ρ(T) + εn )n " (x0(x0) —— x(x)*(*)) " (8)

for all n = 0, 1, ··· , where ρ(T) denotes the spectral radius of T defined by ρ(T) := max{|λ| : λ is an eigenvalue of T}

and {εn } is a sequence of positive numbers satisfying limn→∞ εn = 0.

(c) (7 points) Therefore, to determine the convergence speed of the method (4), it suffices to determine ρ(T). Show that T is similar to the block diagonal matrix

\ . Td,

where each Ti is a 2 × 2 matrix given by

Ti := (1 + β1— αλi β ) .

(d) (2 points) Find the eigenvalues of each Ti as a function of λi , α and β .

(e) (4 points) If β ≥ max{(1 — √αm)2 , (1 — √αL)2 }, show that each eigenvalue λ of Ti is a complex number and satisfies |λ| = √β ; consequently ρ(T) = √β .

(f) (3 points) For the choices of α and β given in (5), show that the condition in (e) holds and conclude that

ρ(T) ≤ √β = .

(iii) (14 points) Consider one more variant of the gradient method

zn = xn + β(xn — xn−1), xn+1 = zn — α▽f(zn ) (9)

with initial guess x0 = x −1 . Carryout the convergence analysis of the method (9) when applied to solve (2) with

α := , β := (10)

by completing the following items:

(a) (3 points) From the formulation of the method (9) to derive that

xn+1 — x* = xn — x* — αA(xn — x* + β(xn — xn−1)) + β(xn — xn−1).

Based on this equation, derive further that

(xxn(n+)1——x* ) = S (xn(x)1——xx(*)* ) (11)

for all n = 0, 1, ··· , where S is a 2d × 2d matrix given by

S := ((1 + β) — αA) —β(I0— αA)) ,

where I denotes the d × d identity matrix.

(b) (2 points) Use (11) and the Gelfand’s formula to show that

" (xn(x)1——xx(*)* ) " ≤ (ρ(S) + εn )n " (x0(x0) —— x(x)*(*)) " (12)

for all n = 0, 1, ···, where ρ(S) denotes the spectral radius of S and {εn } is a sequence of positive numbers satisfying limn→∞ εn = 0.

I I

I . . I ,

\ . Sd,

where each Si is a 2 × 2 matrix given by

(d) (4 points) With the choices of α and β given in (10), show that every eigenvalue λ of Si satisfies

Consequently

(iv) the problems using the following Matlab code fragment (or its equivalent in another language) to generate a Hessian A with eigenvalues in the range [m, L]:

mu=0.01; L=1; kappa=L/mu; n=100;

A = randn (n , n ) ; [Q, R]=qr (A) ;

D=rand (n , 1 ) ; D=10.ˆD; Dmin=min(D) ; Dmax=max(D) ; D=(D−Dmin)/(Dmax−Dmin) ;

D = mu + D*(L−mu) ; A = Q’* diag (D)*Q; epsilon =1. e−6;

kmax=1000;

x0 = randn (n , 1 ) ; % use a different x0 for each of the 10 trials

Write a code (Matlab or Python) to implement the following methods:

• The gradient method (3) with α = 2/(m + L).

• The gradient method (3) with α = 1/L.

• The method (4) with α and β given by (5)

• The method (9) with α and β given by (10).

Run the code in each case until f(xn ) ≤ ϵ for tolerance ϵ = 10 −6 . Do the following: (a) Tabulate the average number of iterations required, over 10 random starts.

(b) Draw a plot of the convergence behavior. on atypical run, plotting iteration number against log10 (f(xn ) − f(x* )). (Use the same figure, with different colors for each algorithm.)

(c) Discuss your results, noting in particular whether the worst-case convergence analysis is reflected in the practical results.

联系我们