CS4260/CS5260 Homework 3
1. (2pts) In this problem we discuss the famous prisoner dilemma. The payoff matrix is as
following:
Player A / Player B Silent Betray
Silent -1, -1 -3, 0
Betray 0, -3 -2, -2
Here both players are prisoners who committed a crime as partners, and they now have
two choices: To keep their mouths shut or to betray their partner. The utility function
represents the number of years that the prisoners will serve – that is, if both prisoners
decide to stay silent, they will both serve 1 year. However, if prisoner A decides to
betray while prisoner B stays silent, prisoner A gets to walk free and prisoner B will
serve 3 years. If both prisoners decide to betray, they will both serve two years.
a. Is this a zero-sum game? Why?
b. Is there a Nash equilibrium in this game? Why?
2. (2pts) Suppose we want to find the minimum of function () = 2 ? 2 + 3 with
iterative optimization algorithms, and current estimation is 0 = 10
1) If using gradient descent with learning rate = 0.1, compute the next three
estimations 1, 2, 3.
2) If using Newton’s method, compute the next three estimations 1, 2, 3.
3) Which method get closer to the actual minimum after 3 steps?
3. (2pts) In Figure 1, there are two players, player 1 is at state A, which action should
player 1 pick using Minimax algorithm? Why?
4. (2pts) In Figure 1, (1) which branches can be pruned if using Alpha–Beta Pruning? Please
show the calculation process.
(2) If changing the search order to C, B, D, which branches will be pruned?
(3) Are the branches pruned the same in (1) and (2)? Why?
(4) Is the action selected by (1) and (2) the same? Why?
Figure 1. Game Tree. The values at the bottom represent Player 1’s utilities.
5. (2pts) Give two examples of applications or algorithms using Monte Carlo Method and
explain the reason and benefit of using it.