辅导HW 10辅导Python程序

CS 170, Spring 2020 HW 10 A. Chiesa J. Nelson
CS 170 HW 10
Due 2019-04-13, at 10:00 pm
1 Study Group
List the names and SIDs of the members in your study group. If you have no collaborators,
you must explicitly write none.
2 Opting for releasing your solutions
We are considering releasing a subset of homework submissions written by students for stu-
dents to see what a full score submission looks like. If your homework solutions are well
written, we may consider releasing your solution. If you wish that your solutions not be
released, please respond to this question with a ”No, do not release any submission to any
problems”. Otherwise, say ”Yes, you may release any of my submissions to any problems”.
3 Zero-Sum Games
Alice and Bob are playing a zero-sum game whose payoff matrix is shown below. The ijth
entry of the matrix shows the payoff that Alice receives if she plays strategy i and Bob plays
strategy j. Alice is the row player and is trying to maximize her payoff, and Bob is the
column player trying to minimize Alice’s payoff.
Alice \Bob 1 2
1 4 1
2 2 5
Now we will write a linear program to find a strategy that maximizes Alice’s payoff. Let the
variables of the linear program be x1, x2 and p, where xi is the probability that Alice plays
strategy i and p denotes Alice’s payoff.
(a) Write the linear program for maximizing Alice’s payoff. (Hint : You should think of set-
ting up the constraints of the program such that it finds the best worst case strategy. This
would depend on the strategy Bob plays assuming he knows what Alice’s (probabilistic)
strategy is.)
(b) Eliminate x2 from the linear program and write it in terms of p and x1 alone.
(c) Draw the feasible region of the above linear program in p and x1. You are encouraged to
use a plotting software for this.
(d) Write a linear program from Bob’s perspective trying to minimizing Alice’s payoff. Let
the variables of the linear program be y1, y2 and p, where yi is the probability that Bob
plays strategy i and p denotes Alice’s payoff.
(e) What is the optimal solution and what is the value of the game?
1
CS 170, Spring 2020 HW 10 A. Chiesa J. Nelson
4 Zero-Sum Battle
Two Pokemon trainers are about to engage in battle! Each trainer has 3 Pokemon, each of
a single, unique type. They each must choose which Pokemon to send out first. Of course
each trainer’s advantage in battle depends not only on their own Pokemon, but on which
Pokemon their opponent sends out.
The table below indicates the competitive advantage (payoff) Trainer A would gain (and
Trainer B would lose). For example, if Trainer B chooses the fire Pokemon and Trainer A
chooses the rock Pokemon, Trainer A would have payoff 2.
Trainer B:
ice water fire
dragon -10 3 3
Trainer A: steel 4 -1 -3
rock 6 -9 2
Feel free to use an online LP solver to solve your LPs in this problem.
Here is an example of a Python LP Solver and its Tutorial.
1. Write an LP to find the optimal strategy for Trainer A. What is the optimal strategy
and expected payoff?
2. Now do the same for Trainer B. What is the optimal strategy and expected payoff?
5 How to Gamble With Little Regret
Suppose that you are gambling at a casino. Every day you play at a slot machine, and your
goal is to minimize your losses. We model this as the experts problem. Every day you must
take the advice of one of n experts (i.e. a slot machine). At the end of each day t, if you take
advice from expert i, the advice costs you some cti in [0, 1]. You want to minimize the regret
R, defined as:
R =
1
T
(
T∑
t=1
cti(t) − min1≤i≤n
T∑
t=1
cti
)
where i(t) is the expert you choose on day t. Your strategy will be probabilities where pti
denotes the probability with which you choose expert i on day t. Assume an all powerful
adversary (i.e. the casino) can look at your strategy ahead of time and decide the costs
associated with each expert on each day. Let C denote the set of costs for all experts and all
days. Compute maxC(E[R]), or the maximum possible (expected) regret that the adversary
can guarantee, for each of the following strategies.
(a) Choose expert 1 at every step, i.e. pt1 = 1 for all t.
(b) Any deterministic strategy, i.e. for each t, there exists some i such that pti = 1.
(c) Always choose an expert according to some fixed probability distribution at every time
step. That is, fix some p1, . . . , pn, and for all t, set p
t
i = pi.
What distribution minimizes the regret of this strategy? In other words, what is
argminp1,...,pn maxC(E[R])?
2
CS 170, Spring 2020 HW 10 A. Chiesa J. Nelson
This analysis should conclude that a good strategy for the problem must necessarily be
randomized and adaptive.
6 Solving Linear Programs using Multiplicative Weights
In this problem, we will develop an algorithm to approximately solve linear programs using
multiplicative weights. For simplicity, we will restrict our attention to a specific kind of linear
programs given as follows:
Given vectors aj for j = 1, . . . ,m in Rn, the linear program on variables x = (x1, . . . , xn) is
given by:
max(0)
for all j = 1, . . . ,m : 〈aj ,x〉 ≤ 0
n∑
i=1
xi = 1
for all i = 1, . . . , n : xi ≥ 0
Here 〈x,y〉 denotes the dot product between vectors, specifically 〈x,y〉 =
n∑
i=1
xi · yi where
xi, yi refers to the i
th components of the vector x,y.
We will now use multiplicative weights update towards solving our linear program. The idea
would be to execute multiplicative weights update against an appropriate adversary (sequence
of losses) and let the weights of the algorithm determine the solution x. Formally, the rough
outline of our algorithm to solve the above LP is as follows:
For t = 1 to a sufficiently large number T
1. Multiplicative weights update will return a current probability
distribution x(t)
2. If x(t) approximately satisfies the LP, return x(t).
3. Adversary A will present a loss vector `(t) ∈ Rn.
4. Multiplicative weights algorithm will incur a loss of 〈`(t),x(t)〉 and
update its weights.
Return x(T )
We will design an algorithm that will act as the adversary A, so that x(T ) is an approximate
solution to the LP.
Recall that the multiplicative weights algorithm obeys the following regret bound:
3
CS 170, Spring 2020 HW 10 A. Chiesa J. Nelson
Theorem. The multiplicative weights algorithm starts with an iterate x(1) ∈ Rn, and
then suffers a sequence of loss vectors `(1), . . . , `(T ) ∈ Rn such that `(t)i ∈ [−1, 1]. It then
produces a sequence of iterates x(2), . . . ,x(T ) such that for some 0 < ≤ 12 ,
T∑
t=1
〈`(t),x(t)〉 ≤ min
1≤i≤n
(
T∑
t=1
`
(t)
i
)
+ 2
(
T +
log(n)

)
(a) Let p = (p1, . . . , pn) be a probability distribution over {1, . . . , n}, i.e., pi ≥ 0 and∑n
i=1 pi = 1. For any vector v ∈ Rn prove that,
n
min
i=1
vi ≤
n∑
i=1
pivi
(In words, the minimum is smaller than the average under every distribution)
(b) We will now observe an important property of multiplicative weights. The algorithm not
only has small regret against any fixed best expert, but also has small regret against any
fixed probability distribution over experts.
Let p∗ = (p∗1, . . . , p∗n) be a probability distribution on {1, . . . , n}, i.e. p∗i ≥ 0 for each
1 ≤ i ≤ n and ∑ni=1 p∗i = 1. Using the bounds in the regret of the multiplicative weights
algorithm mentioned above, prove that
T∑
t=1
(
〈`(t),x(t)〉 − 〈`(t),p∗〉
)
≤ 2
( log(n)

+ T
)
(c) At the tth iteration, the probability distribution x(t) output by multiplicative weights
update is a candidate solution to our LP. But it is likely to violate some of the constraints
of the LP, namely 〈ai,x(t)〉 ≤ 0.
Describe how to implement the adversaryA to nudge the multiplicative weights algorithm
towards a feasible solution to the LP.
Hint : What loss vector should the adversary A pick in order to penalize the multiplicative
weights algorithm the most for violations?
(d) Let us call a solution x to be η-approximate solution to the LP if it satisfies all the
constraints within an error of η where η ≤ 2, i.e.,
〈ai,x〉 ≤ η
.
4
CS 170, Spring 2020 HW 10 A. Chiesa J. Nelson
Assume that the LP is feasible in that there exists some probability distribution x∗ that
satisfies all the constraints. For simplicity, let us assume that all the coefficients in our
LP, |(aj)i| ≤ 1. Show that if multiplicative weights uses = η4 then after T = 16 log(n)η2
iterations, the distribution x(T ) is an η-approximate solution.
Hint : In every iteration t, if the current distribution x(t) violates a constraint by more
than η, then it accumlates regret for not being the feasible solution x∗
7 Restoring the Balance!
We are given a network G = (V,E) whose edges have integer capacities ce, and a maximum
flow f from node s to node t. Explicitly, f is given to us in the representation of integer flows
along every edge e, (fe).
However, we find out that one of the capacity values of G was wrong: for edge (u, v), we used
cuv whereas it really should have been cuv − 1. This is unfortunate because the flow f uses
that particular edge at full capacity: fuv = cuv. We could run Ford Fulkerson from scratch,
but there’s a faster way.
Describe an algorithm to fix the max-flow for this network in O(|V |+ |E|) time. Also give a
proof of correctness and runtime justification.