COSC 320 – 001
Analysis of Algorithms
2020 Winter Term 2
Project Topic Number: 5
Analysis of the algorithm: An Informed Search Strategy
Before we talk about the pseudo-code, we form the question almost same as the we
did in the first milestone, specifically:
States: Each state is basically representing a city on the map.
Initial state: Whatever the user decides to be.
Actions: A set of flights originating from the current city. For each flight, we have
flight fee, flight time, and waiting time.
Transition Model: (Referring to the definition) The action of taking P from A
would result in B.
Goal Test: If we are at the final location (the destination).
Path Cost: The result would be a set of actions, namely p = {P1, P2, …} that would
transfer the initial state to the goal state. Hence, the Path Cost would be the total
financial cost, waiting time between flights, and flight time. Different from Last
time, the Path Cost (g(n)) would work with a consistent heuristic function,
namely h(n), to decide which node to expand.
We call the function𝒇(𝒏) = 𝒈(𝒏) + 𝒉(𝒏): the estimated cost of cheapest path
cost from initial state to the goal state at node n.
A Consistent Heuristic:
We would use the same an artificial dataset that contains random statistics as the
previous milestone is, but we would add another matrix, namely D, to provide the extra
information to derive the h(n). In particular, the matrix defines the real distances
between any two cities.
∀𝐴, 𝐵 ∈ 𝑆, 𝑑𝐴,𝐵 𝑑𝑒𝑓𝑖𝑛𝑒𝑠 𝑡ℎ𝑒 𝑟𝑒𝑎𝑙 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑐𝑖𝑡𝑦 𝐴, 𝐵
𝑊ℎ𝑒𝑟𝑒 𝑆 𝑖𝑠 𝑡ℎ𝑒 𝑠𝑒𝑡 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑡𝑎𝑡𝑒𝑠.
Pseudo-code
This time, we approach the problem using A* Search. Similar to Uniform Cost
Search, we use a priority queue to help us decide the path. This time, we use the
estimation g(n) instead of path cost.
define node with the state as problem’s initial state, path cost zero //initialization
define PriorityQueue frontier to store all the nodes and sort by the estimated
cost
define Set explored //to be filled
while(true)
if frontier is empty then return failure //no solution
pop frontier and set node to it //now node is the lowest cost in frontier
if node is at goal state then return solution //there is optimal solution
add the state of node to the explored list
now iterate through all available actions at node
define child as the successor we are looking at in this iteration
if child’s state is not in frontier or explored then add it in to the frontier
else if child’s state is in frontier and child has a lower estimated cost// notice
that we use an “estimated cost” instead of “path cost” here
then update the frontier with the child //to get optimal solution
Proof of Correctness:
A* Search provides an optimal solution if such solution exists.
But, before we prove the correctness, let’s prove that the heuristic function h(n) is
consistent. In other words, the cost of any node n to its predecessor, namely n’, plus
its estimated cost is always greater or equal to n’s estimated cost.
ℎ(𝑛) ≤ 𝑐(𝑛, 𝑃𝑛,𝑛′
, 𝑛
′
) + ℎ(𝑛′)
Proof:
Recall that we define the path cost to be a sum of three terms:
𝑐(𝑛, 𝑃𝑛,𝑛′
, 𝑛
′
) = 𝑘1 ∙ 𝑔1 + 𝑘2 ∙ 𝑔2 + 𝑘3 ∙ 𝑔3𝑤ℎ𝑒𝑟𝑒 𝑘1 ≥ 1 , 𝑘2, 𝑘3 ≥ 0 &
𝑔1, 𝑔2, 𝑔3 𝑟𝑒𝑝𝑟𝑒𝑠𝑒𝑛𝑡𝑠 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒, 𝑓𝑙𝑖𝑔ℎ𝑡 𝑡𝑖𝑚𝑒 𝑎𝑛𝑑 𝑤𝑎𝑖𝑡𝑖𝑛𝑔 𝑡𝑖𝑚𝑒
From this, we can derive that:
𝑔1
(𝑛, 𝑛′) ≤ 𝑘1 ∙ 𝑔1 + 𝑘2 ∙ 𝑔2 + 𝑘3 ∙ 𝑔3 = 𝑐(𝑛, 𝑃𝑛,𝑛
′ , 𝑛
′
) (𝑆0)
Now, according to the triangular inequality:
ℎ(𝑛) ≤ 𝑔1
(𝑛, 𝑛′) + ℎ(𝑛
′
)
Take in Statement 0 yields this:
ℎ(𝑛) ≤ 𝑐(𝑛, 𝑃𝑛,𝑛
′ , 𝑛
′
) + ℎ(𝑛
′
)
Lastly, add the g(n) on both sides, we have completed the proof:
𝑓(𝑛
′
) = 𝑔(𝑛
′
) + ℎ(𝑛) = 𝑔(𝑛) + 𝑐(𝑛, 𝑃𝑛,𝑛′
, 𝑛
′
) + ℎ(𝑛
′
) ≥ 𝑔(𝑛) + ℎ(𝑛)
Now that we have proved the heuristic function is consistent, we have basically
derived that “the values of estimations are always increasing along the path). See this:
∀𝑛 ∈ 𝑆 𝑤ℎ𝑒𝑟𝑒 𝑛 ℎ𝑎𝑠 𝑎 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑜𝑟 𝑛𝑎𝑚𝑒𝑙𝑦 𝑛′, 𝑓(𝑛
′
) ≥ 𝑔(𝑛) + ℎ(𝑛)
Now let’s prove with induction that whenever the A* search expanded a node, it
expands the node with an optimal path. (Notice that this is not exactly same as
proving A* search yields the optimal solution. What we are going to prove contains
the statement.)
Base Case: The first node is always expanded with an optimal cost of 0.
Inductive Step: We argue by contradiction.
We assume the IH holds that all the nodes up to 𝑛0 are expanded with an
optimal cost. However, it would be absurd if we assume that 𝑛0′ is not expanded
with an optimal cost. Since there would have been another 𝑛′ sitting on the
presumably more optimal path that would have been selected by the priority
queue in the first place.
Hence, 𝐼𝑛𝑖𝑡𝑖𝑎𝑙 𝑆𝑡𝑎𝑡𝑒 ~ 𝑛0 𝑏𝑒𝑖𝑛𝑔 𝑒𝑥𝑝𝑎𝑛𝑒𝑑 𝑤𝑖𝑡ℎ 𝑜𝑝𝑡𝑖𝑚𝑎𝑙 𝑠𝑜𝑙
′𝑛
𝑦𝑖𝑒𝑙𝑑𝑠
→ 𝑛0
′ 𝑏𝑒𝑖𝑛𝑔 𝑒𝑥𝑝𝑎𝑛𝑒𝑑 𝑤𝑖𝑡ℎ 𝑜𝑝𝑡𝑖𝑚𝑎𝑙 𝑠𝑜𝑙′𝑛
Conclusion: By the principle of Strong Induction, we have derived that all nodes
in A* search are expanded with optimal solution.
Hence, A* search provide optimal solution to a graph search problem.
Running Time: A* search's time complexity is determined by the heuristic. The
number of nodes extended in the worst case of an unbounded search space is
exponential in the depth of the solution (the shortest path) d: 𝑂(𝑏
𝑑
), where b is the
branching factor (the average number of successors per state). This assumes that a
goal state exists at all, and is reachable from the start state; if it is not, and the state
space is infinite, the algorithm will not terminate.
The time complexity is polynomial when the search space is a tree, there is a single
goal state, and the heuristic function h meets the following condition:
where h* is the optimal heuristic, the exact cost to get from x to the goal. In other
words, the error of h will not grow faster than the logarithm of the "perfect
heuristic" h* that returns the true distance from x to the goal. [2]
Data Structure.
A priority queue is commonly used in A* implementations to perform the repeated
collection of minimum (estimated) cost nodes to extend. The open set, also known as
the fringe, is a priority queue. The node with the lowest f(x) value is removed from
the queue at each step of the algorithm, and its neighbors’ f and g values are modified
accordingly, and these neighbors are returned to the queue. The algorithm continues
until a target node is found (the node with the lowest F value of all fringe nodes).
Since h at the target is zero in a consistent heuristic, the F value of that goal is also the
cost of the shortest path. [3]
Unexpected Cases/Difficulties.
One of the difficulties is the dataset: how are we defining the path cost and estimated
cost matrix exactly? Same as what we’ve mentioned in the last milestone, when we
define a path cost P, we define it as x1*f1+ x2*f2+ x3*f3 where f is in the unit of CAD
or hours and x is empirically determined.
However, when we look at the heuristic function, we are measuring the distance
directly. It guarantees the function to be consistent, while it does not incorporate time
elements. One reason is that flight time and wait time estimations are untrivial to
derive when we try to look at two cities that do not have any available flight in
between. In fact, we cannot provide any coefficient k based on existing factors such
that: 𝑘 > 1, 𝑘 ∈ ℝ & 𝑓(𝑛) = 𝑔(𝑛) + 𝑘 ∗ ℎ(𝑛), 𝑤ℎ𝑒𝑟𝑒 𝑘 ∗ ℎ(𝑛) 𝑖𝑠 𝑎𝑑𝑚𝑖𝑠𝑠𝑖𝑏𝑙𝑒
We will look into this problem again when we implement the function. If it turns out
such k can be derived, we will modify the pseudo-code and provide the proof of
correctness again.
Task Separation and Responsibilities.
Jack Wang did the first draft of the Second Milestone including Data Structure and
Running Time.
Larry Gu revised the first draft, specifically, he:
rewrote the Analysis of the algorithm
provided explanations for the heuristic function
provided the Pseudo-code
rewrote the Proof of Correctness
rewrote the Unexpected Cases/Difficulties.