ARIN5102 2025 Fall Semester Assignment #2
You’re encouraged to use your favorite AI to work on the problems.
Date assigned: Monday, Oct 20
Due time: 23:59 on Sunday, Nov 2
How to submit it: Submit your written answers as a pdf file on canvas.ust.hk.
Penalties on late papers: 20% of each day (anytime after the due time is considered late by one day)
Problem 1 (10%). Given a graph and a node in it as the starting point, the traveling salesman problem (TSP) is about finding a least cost path that starts and ends at the starting node, and goes through each other node in the graph once and exactly once. The edges of the graph are labeled by a number representing the cost of traveling between the connecting two nodes. Formulate this problem as a search problem by specifying the states, possible initial states, goal test, operators, and operator costs.
Problem 2. (10%) (Adapted from Russell and Norvig) Consider the problem of con – structing crossword puzzles: fitting words into a grid of intersecting rows and columns of squares. Assume that a list of words (i.e. dictionary) is provided, and that the task is to fill in the rows and columns with words from this list so that if a row intersects with a column, their intersecting square has the same letter. Formulate this problem as an assign- ment (constraint satisfaction) problem by specifying the variables, their domains of possible values, and the constraints.
Problem 3. (10%) Consider the following two investment decisions: buy a certified deposit (CD) which will pay you a 10% annual interest rate, and buy some stocks which will either give you a 30% annual return (with 0.7 probability) or incur a 10% loss for you (with 0.3 probability). In terms of expected utility, you should of course buy stocks. However, suppose you are conservative and cannot tolerate any loss of your principal. In this case, you have no choice but to deposit your money into a CD. Now consider this problem for a 3 years time span. You start with 1 unit of money. Each year you can choose only one way to invest your entire fund (meaning you are not allowed to diversify, like 50% CD, 50% stocks). You definitely don’t want to lose any money at the end of the 3 years, but you are okay with a “temporary” loss during the time (for example, it is okay for you to lose some money in the first year as long as you can make it up later). Formalize this problem as a Markov decision process and compute its optimal policy.
Hints A state needs to contain information about how many money you have in this state. In other words, a state needs to encode the history: which actions you have done so far and what their outcomes are if the actions are non-deterministic.
Problem 4. (15%) Recall our boundary following robot. The correct behavior. is that the robot should follow a boundary in a consistent way, clockwise if it’s inside the wall of the room, and counter-clockwise if it is outside an obstacle in the room.
4.1 Formulate this task as a MDP.
4.2 Formulate this task as a reinforcement learning problem.
Problem 5. (15%) Consider the following search problem:
• state space: {S, A, B, G};
• operators: O1 maps S to A with cost 4, O2 maps S to B with cost 2, O3 maps B to A with cost 1, and O4 maps from A to G with cost 5;
• initial state S;
• goal state G;
• heuristic function h: h(S) = 7, h(A) = 1, h(B) = 6, and h(G) = 0.
Find a solution using A* search by tree. Number the nodes according to their order of expansion, and for each node give its f(n) = g(n) + h(n) value.
Problem 6. (10%) Perform. minimax with perfect decision on the tree given in the following figure (the leaves are terminal nodes, and the numbers next to them are their utility values).
Figure 1: A minimax search tree
Problem 7. (10%) Perform. a left-to-right alpha-beta prune on the tree in the above exercise. Perform. a right-to-left prune on the same tree. Left-to-right or right-to-left means the order by which the leaf nodes are generated. So for left-to-right, the first leaf node considered is D, followed by E, followed by M and so on.