The University of Melbourne
School of Computing and Information Systems
COMP90038 Algorithms and Complexity
Assignment 2, Semester 2 2021
Released: Monday September 27 2021. Deadline: Wednesday October 13 2021 23:59.
This assignment is marked out of 30 and is worth 15% of your grade for COMP90038.
Objectives
To improve your understanding of the time complexity of algorithms and data structures. To design computational
solutions to problems and develop problem-solving skills. To improve written communication
skills; in particular the ability to present algorithms clearly, precisely and unambiguously.
Scenario
In this assignment we consider a data center that receives requests for webpages and returns back the
requested pages. The data center receives millions of requests and needs to answer them with minimal
delay. As a result, it stores millions of webpages across servers to ensure it can answer requests in parallel.
The data center has two o↵erings on hosting sites with di↵erent guarantees on availability and response
time. The first one is aimed at critical web applications (e.g., hosted by hospitals and banks) and the
second one for less critical applications that can sustain a small delay. In this assignment you will design
data structures and algorithms for answering webpage requests eciently
over time.
Note that you can work on each problem separately as each problem is independent of others even if
the notation and the scenario is shared. However, it is advisable to read all problems before commencing.
Problem 1 [8 points]
Each request r arrives with the following information represented as a tuple: r.arrival time, r.webaddress,
r.is critical and r.source, where
• r.arrival time is the arrival time in milliseconds (represented as integers);
• r.webaddress is the address of the requested page;
• r.is critical is the o↵ering that was chosen by the owner of webaddress and can either be True to
denote a critical address and False otherwise;
• r.source denotes where the request was made from and where the webpage should be sent back to.
More than one request can arrive at the same time. Hence, every new request that comes has arrival time
that is either the same or higher than arrival time of previous requests. More than one request can arrive
for the same page. However, if more than one request arrives for the same page, their r.source will be all
distinct. Note that is critical is the same for all requests with the same webaddress as it depends on the
webaddress only.
A dispatcher program processes the requests, stores them, decides which request to process next based
on their priorities. In order to determine which request to answer next, the dispatcher chooses the request
depending on its arrival time and its is critical value. Any request r1 with is critical of True is answered
before a request r2 with is critical of False, if r1 arrives at most 5 milliseconds after r2. Ties are broken
arbitrarily.
1
A. [3 points] Suppose you have access to is higher priority(r1, r2) function that takes two requests r1
and r2 and returns True if r1 should be answered before r2 given the priority rule described above,
and False otherwise. Given is higher priority function one can use a heap to insert a new request and
eject the next request. We refer to this as Approach 1.
Consider Approach 2 that does not use is higher priority. Instead it uses two heaps: one heap to
maintain requests to critical pages (i.e., those with is critical set to True) and one heap for requests
to non-critical pages.
Let m0 and m1 be the number of requests to critical and non-critical webpages, respectively. In
the worst case, what is the number of comparisons that have to be performed by each of the
approaches for inserting and ejecting a new request. Give your answers using asymptotical notation.
Is Approach 1 or Approach 2 more superior asymptotically? Argue your answer.
B. [4 points] As users, we tend to start refreshing our browsers if a webpage is not loading. This results
in a duplicate request coming to a data center with a new arrival time. We call a request rnew a
duplicate of rold if their webaddress and source fields are the same. To avoid creating a duplicate
request, we wish to update an existing request instead. However, the update to a new request time
also results in the update of the priority of this request.
Write a pseudo-code for a function HeapUpdate(H, rold, rnew) that updates a data structure H of
Approach 1 so that rold is replaced with rnew and the data structure is adopted to reflect the change
in the priority. You can assume that rold is in H. When operating on heap, assume that H is stored
as an array, starting at index 1 as defined in Lecture 13 (slides “Heaps as Array”).
(No mark: Think whether this is this a good idea to update requests based on their new time.)
C. [1 point] The data center has k servers to improve the response time and hence can answer up to
k requests in parallel. Consider the following incorrect algorithm for finding top k requests using
Approach 1: the algorithm returns elements H[1], H[2],...,H[k].
Give two examples of heaps with number of requests m
5 and k
3. In the first example,
the above algorithm would return an incorrect top k requests. In the second example, the above
algorithm would return the correct top k requests.
In your examples, assume that is critical is True in all requests. Your heaps should be given using
array notation, e.g., H = [ , a, b, c], defines a heap with root a, left node b and right node c. You
only need to indicate the value of arrival time in your heap and ignore all other fields.
cen
Problem 2 [12 points]
Once the dispatcher finds the next request to answer, it needs to decide which server to send the request
to. Webpages are replicated across the servers to improve the throughput of the system as some webpages
are more popular than others and receive more requests.
Each server has an ID, however server IDs are not sequential. Every webaddress is associated with
lo and hi which signify the range of server IDs that have a replica of the corresponding webpage. Each
webaddress can be answered by any one of the servers whose ID falls in the range [lo, hi]. For example, if
there are in total 5 servers with IDs 1, 3, 7, 8, 10, and www.myaddress.com has lo and hi set to 3 and 8,
respectively, then either server 3, 7 or 8 can answer the request for www.myaddress.com.
Servers are stored in a balanced binary search tree rooted at T. You can assume that each node of the
tree has fields T.key, T.left, T.right for accessing the server ID, and nodes on the left and right of node T,
correspondingly. T is null if the node is empty.
2
A. [4 points] Given T, lo and hi, devise an algorithm FindServerBounds(T, lo, hi) that returns two server
IDs k1 and k2 in T representing the smallest and highest server ID that fall in the range [lo, hi].
That is, if S are all the server IDs in the tree, then lo k1 k2 hi, k1, k2 2 S and there is no k0
1
s.t. k0
1 2 S and lo < k0
1 < k1 and no k0
2 2 S such that k2 < k0
2 < hi.
Your task here is to implement the FindServerBounds(T, lo, hi) algorithm. Only algorithms with
optimal time complexity may receive full marks.
B. [1 point] If the number of server IDs stored in the BST is k, what is the asymptotic performance
of FindServerBounds(T, lo, hi)? Show your derivations.
C. [7 points] We now will consider a setting where the system administrator could secure k servers
with sequential IDs. The servers are stored in an array S. Hence, any server stored at S[lo],
S[lo + 1], S[lo + 2], ..., S[hi
1], S[hi] could answer a webpage request. However, some servers may
be more busy than others. We wish to send the current request to the least busy server among
S[lo], S[lo + 1],..., S[hi]. We define the number of current requests assigned to a server, as the load
of that server. Let L be an array that stores the load associated with each server in S (the size
of S and L are the same), such that L[j] stores the number of current requests assigned to the
server S[j]. A naive way to find the server with the smallest number of requests is by traversing
L[lo], L[lo + 1],..., L[hi] and finding the minimum. However, this would incur linear cost.
In this question you are asked to write pseudo-code of a function FindLeastLoadedServer(S, L, lo, hi)
that returns the ID of the least busy server in the range between lo and hi. Your algorithm has
to use the knowledge that no address is assigned to more than (log k)2 servers. Your algorithm
should run in time O(log k) where k is the total number of servers.
Hint: consider precomputing an auxiliary data structure of minimums for some predefined ranges
that you can later use to answer other range queries. That is your solution can consist of two
functions: one function that does the precomputation to generate an auxiliary data structure and
FindLeastLoadedServer that uses this data structure later on when answering a query. Your precomputation
function can take time O(k) or O(k log k).
Problem 3 [10 points]
In this problem, we assume that each webaddress is assigned to only one server, there are k servers in
total and each server is assigned a unique ID between 1 and k.
Once in a while a system administrator notices that some servers become overloaded with requests and
have a low response time as a result. At this point the administrator is required to reassess the assignment
of webaddresses to servers so that popular webaddresses are distributed among multiple servers as opposed
to being loaded to one server.
Information about each server is stored in a tuple s that contains the following fields:
• s.p, the number of addresses it is assigned to.
• s.A, an array of addresses assigned to it. The array size is s.p.
• s.R, an array of requests that an address in s.A has received such that s.R[i] stores the number of
requests of address s.A[i], for each 1 i s.p. The array size is s.p.
All the above server information is stored in an array W of size k. That is W[j], for 1 j k, returns a
tuple s as defined above for a server with ID j.
Consider the following request balancing algorithm, ReBalanceServers. It creates a new array W0 of k
servers with no addresses assigned to them at the beginning. It then goes sequentially over old servers
3
in W, one server at a time starting at index 1, and reassigns each webaddress from this old server, one
webaddress at a time starting at index 1, to one of the new servers. It selects the least request-loaded new
server when doing so (breaking ties arbitrarily). At the beginning of the process, the load of each new
server is 0. It is then increased by the number of requests of the webaddresses assigned to it (as given
in s.R[i]). At the end of the algorithm, all addresses stored in W are assigned to W0
. At this point, the
content of W can be overwritten with W0
.
A. [2 points] Consider the following 9 addresses and their request numbers that exist in the system
represented as a tuple (address, number of requests):
(a1.com, 5), (b1.com, 20), (c1.com, 30),
(d.com, 50), (a2.com, 5), (b2.com, 20),
(c2.com, 30), (b3.com, 20), (c3.com, 30).
For example, address b3.com received 20 requests.
An optimal solution to balancing the above requests assigns each server 70 requests in total. Below
is one optimal assignment that achieves this:
• server 1 is allocated (b2.com, 20), (d.com, 50)
• server 2 is allocated (a1.com, 5), (c1.com, 30), (c2.com, 30), (a2.com, 5)
• server 3 is allocated (b1.com, 20), (c3.com, 30), (b3.com, 20)
For 1 point, give an example of how these addresses had to be assigned to original servers in W so
that the balancing algorithm ReBalanceServers would output this optimal assignment.
For 1 point, give another example of how these addresses had to be assigned to to original servers
such that ReBalanceServers algorithm would end up assigning d.com, b1.com and c1.com all to one
server, and hence leading to servers being unbalanced.
You answer for each example should be of the form:
• server 1 is allocated ....com, ....com, ...
• server 2 is allocated ...
• server 3 is allocated ....
If a server is not assigned to any address, put null.
B. [6 points] Write the pseudo-code of ReBalanceServers(W) algorithm, defining any data structures
that you are using. Your algorithm should run in time O(w log k) where w is the number of all
addresses in the system. Only algorithms with this time complexity, justifying why/how their
pseudo-code achieves it, will receive full marks.
Note: You can use any sorting algorithm or data structure that we have studied in the lectures,
without implementing them. If you choose to use a data structure, you are asked to state what it
is in the comments. For example, if T is a binary search tree please add
// T is a BST
If a data structure or algorithm you choose to use relies on comparing two elements, you need to also
provide a function cmp(x1, x2) that describes the pseudo-code for comparing x1 and x2. cmp(x1, x2)
returns 0, if x1 = x2, 1 if x1 < x2 and -1, otherwise. You are asked to then capture it as
// T is a BST using comparator function cmp defined below/above
4
C. [2 points] Describe how ReBalanceServers algorithm can be modified to avoid giving a non-optimal
assignment as described in Problem 3.A. Your solution should be within four sentences in English
and no pseudo-code is needed. If your solution exceeds this limit, only the first four sentences will
be assessed.
Submission and Evaluation
• You must submit a PDF document via the LMS.
Note: handwritten and scanned images, are not acceptable.
Do not submit Microsoft Word documents — if you use Word, create a PDF version for submission.
• Marks are primarily allocated for correctness, but elegance of algorithms and how clearly you communicate
your thinking will also be taken into account. Where indicated, the complexity of algorithms
also matters.
• We expect your work to be neat – parts of your submission that are dicult
to understand or
decipher will be deemed incorrect. Make sure that you have enough time towards the end of the
assignment to present your solutions carefully. Time you put in early will usually turn out to be
more productive than a last-minute e↵ort.
• You are reminded that your submission for this assignment is to be your own individual work.
For many students, discussions with friends will form a natural part of the undertaking of the
assignment work. However, it is still an individual task. You should not share your answers (even
draft solutions) with other students. Do not post solutions (or even partial solutions) on social
media or the discussion board. It is University policy that cheating by students in any form is not
permitted, and that work submitted for assessment purposes must be the independent work of the
student concerned.
Please see https://academicintegrity.unimelb.edu.au
If you have any questions, you are welcome to post them on Piazza so long as you do not reveal
details about your own solutions. You can also email Olya (email can be found on LMS). In your message,
make sure you include COMP90038 in the subject line. In the body of your message, include a precise
description of the problem.
Late Submission and Extension
Late submission will be possible, but a late submission penalty will apply of 3 marks (10% of the assignment)
per day.
Extensions will only be awarded in extreme/emergency cases, assuming appropriate documentation is
provided – simply submitting a medical certificate on the due date will not result in an extension.