Data Structures and Algorithms: Assignment 2
Due: 7 June 2022, 11:59pm – 100 marks possible (20% of final grade)
When submitting, zip up your entire project into a folder with the following structure:
lastname_firstname_studentID.zip (using your own name and ID) and submit to the Assignment 2
folder on Canvas BEFORE the due date and time. Late submissions will incur a penalty. Anti-plagiarism
software will be used on all assignment submissions, so make sure you submit YOUR OWN work and
DO NOT SHARE your answer with others. Any breaches will be reported and will result in an automatic
failure and possible removal from the course.
Q1. File Sorting) Recommended readings 1.3, 3.2, 3.5) 25 marks)
There is often a big difference between an algorithm’s performance in internal memory and its
performance in external storage. For example, quick sort is considered very efficient for elements held
in memory but is less efficient with elements stored externally in a file, whereas merge sort can be
used more efficiently with sorting elements held externally in files.
Suppose the lines of a large text file are to be sorted, but the file is too large to all fit in memory at
one time. Prepare a class called FileSorter whose constructor accepts an integer limit (a maximum for
the number of Strings that will be held in memory at a time), and input and output text File objects.
The algorithm firstly performs the split stage, repeatedly reading each string from the input file line
by line, breaking the text file apart with at most “limit” strings (where a String is on its own single line
in the file). Once the limit has been reached, the algorithm should then quicksort the elements in
memory then output them back to a smaller temporary file (you may use an existing Java class for the
quicksort routine). The split stage continues until all lines of the input file are read in. After the split
stage, the algorithm then performs the merge stage. The merge stage uses your own merge sort
algorithm to read files together two at a time (deleting the temporary files), until a single sorted file
remains, the output file.
The FileSorter should be able to run as a separate thread, so any GUI program remains responsive to
the user. Add any necessary methods which can be used to notify the user of the progress of the
sorting algorithm.
Create a class called FileSorterGUI which can be used to run FileSorting tasks, it should have a button
to queue up a new task and another button to execute the task at the front of the queue. It should
also display progress bars to update the user on the progress of the currently running task. Feel free
to use the countries.txt and cities.txt files from Canvas to test your sorting application. An example
GUI is shown below:
_01000000 _01000000
_01000000 _01000000
Q2. Trie Data Structures) Recommended readings 4.1, 4.3, 5.2) 25 marks)
A Trie (or prefix-tree) is a tree data structure which usually holds string elements representing words.
Each node in the tree has links to children represented with individual characters that comprise the
string element. Each node can also have a string word (or null if no such word) which is made up of
the path of key characters from the root. To find an element in the tree simply look at the first
character of the target and traverse the child node with the same key character starting from the root
– repeating moving down the tree for each character in the target word. For efficiency child nodes
should be referenced from the parent using a map – using a character as the key, and the object node
as its value.
For example, using the tree below, if we were to see if the trie contains the word “then”, it involves a
traversal down the subtree, starting from the root and navigating the nodes which correspond to the
key characters ‘t’, ‘h’, ‘e’, ‘n’ - and checking whether the final node represented by the letter ‘n’ holds
the string element “then”.
_01000000 20089790
To add a word to the tree simply traverse the node that represents each character in the word. If a
node does not exist for that character, simply attach it as a new child node to the parent with a new
associated key character, the last node down the tree (associated with the last letter in the word)
should hold the newly added element. The trie should not deal with duplicate words.
_01000000 20089790
For example, when “hi” is added to the tree – because the ‘h’ and ‘I’ nodes already exist they are
traversed, and the node element associated with the key ‘i’ is set to “hi”. However, when “there” is
added to the tree, the tree is traversed down existing path ‘t’, ‘h’, ‘e’, but then new nodes with keys
‘r’ and ‘e’ need to be added respectively and attached to their parent and the last node associated
with key ‘e’ having its element set to the word “there”. Finally, if “seth” is added, straight away from
the root node there is no character key for ‘s’ – so the trie needs to build the path with ‘s’, ‘e’, ‘t’, ‘h’
before accommodating the word. The trie now looks like:
Trie data structures are commonly utilized in auto-complete algorithms. For example, if the user
enters the prefix “th” – this input would be used to obtain a list of suggestions back using tree
traversals and obtaining all word elements down the “th” subtree – “the”, “then”, “there”. Similarly,
if the user enters the prefix “hi” a list of suggestions would be “hi”, “him”, “hit”.
Removing elements can be a little trickier as the trie should clean up redundant nodes which are no
longer used for word paths to save memory and keep efficiency for word traversals. Again, the trie
should follow the path down the tree for the target element following the nodes associated with
character keys. The last node on the path should have its element set to null effectively removing the
target word from the tree. For example, as shown below, the removal of the words “he” and “the”,
the target elements are just removed from the trie after following their respective character keys, but
since the nodes are still used for further words down the tree path, they stay in place but now no
longer have a word associated with them. However, when “a” is removed from the tree it does not
have any child nodes so the key-value (character key ‘a’, with value node holding the word “a”) pair is
removed from its parent (the root). Similarly, when “then” is removed from the tree, the node with
key ‘n’ has no children (meaning it is no longer on the path to further nodes down the tree) it is
removed from its parent. This rule could recursively propagate back up the tree removing nodes that
are no longer used for words in the trie. The updated tree is shown below.
Finally, to illustrate the consecutive removal of nodes back up the tree, if we were to remove “there”
from the trie, because several characters on the path ‘h’, ‘e’, ‘r’, ‘e’ are no longer used for further
words down the tree, the nodes and character keys are repeatedly removed from their parent up until
the node associated with ‘t’ as shown below.
Using the UML, create a class called Trie which implements a trie data structure specifically for Strings.
It should maintain an innerclass called TrieNode which possibly holds a string word and a reference to
children using a map data structure where each child node is accessed using a Character key. Add any
other suitable private helper routines if required. Create a suitable toString and a simple main method
to test your implementation.
Trie
- root: TrieNode
+ Trie()
+ add(element : String) : boolean
+ remove(element : String) : boolean
+ contains(element : String) : boolean
+ removeAll(prefix : String) : boolean
+ startsWith(prefix : String) : boolean
+ suggestions(prefix : String) : Set
+ toString() : String
_01000000 20089790
Q3. HashSetWithChaining) Recommended readings 2.1, 5.1) 20 marks)
A set is a collection of elements with no repetitions and does not necessarily care about ordering of
its internal elements. A set can be described with the Set interface available in the Java Collections
API and the following UML. Create a class called HashSetWithChaining which implements an efficient
set using an underlying hash table, resolving any possible collisions by chaining. Feel free to look at
the Set Java documentation to get a full understanding of how these methods should operate. Make
sure you implement this class from scratch, creating your own hash table and Node class. Do not
encapsulate an existing Java data structure.
20089790
_01000000
_01000000
Your set should maintain a default load factor of 75%, otherwise it will have to expand capacity. A
custom load factor and initial capacity of the hash table can also be given as optional parameters
during construction. Create a test class with a suitable main method which “effectively” tests most of
the operations of your HashSetWithChaining class. For testing modify the toString method so that it
prints out the entire contents of the hash table showing chained elements, where necessary. See
example screenshot below of a simple test.
Q4. Journey Planner) Recommended readings 5.2, 6.1, 6.2) 30 marks)
Obtain the completed class called BusTrip, available from Canvas, which represents a single trip for a
bus travelling between two locations with associated departure and arrival times. A BusTrip has a cost
(given by static stage constants) and a unique bus route identifier. A bus leaves from a departLocation
on the departTime and arriving at the arrivalLocation on the arrivalTime. You can assume a BusTrip is
express, and only end to end with no location stops in between. All bus attribute information is passed
into its constructor. The class has getter methods for each attribute and a toString representation for
the object. A BusTrip is considered equal to another if the departure time, departure location, and
route identifier are the same.
_01000000 20089790
Part A: Using the following UML diagram below create a class called BusJourney that represents a
journey comprised of one or more BusTrip objects between multiple locations. The class encapsulates
a List data structure of BusTrip objects, keeping the dated order of the necessary trips which comprise
a complete journey. There are two constructors, firstly a default constructor creating a journey with
no initial trips, secondly a constructor that adds an existing list of BusTrip objects to the underlying
data structure.
BusJourney
- busList : List
+ BusJourney()
+ BusJourney(list : List)
+ addBus(bus : BusTrip) : boolean
+ removeLastTrip() : boolean
+ containsLocation(location : String) : boolean
+ getOrigin() : String
+ getDestination() : String
+ getOriginTime() : LocalTime
+ getDestinationTime() : LocalTime
+ cloneJourney() : BusJourney
+ getNumberOfBusTrips() : int
+ getTotalCost() : double
+ toString() : String
The addBus method should add a BusTrip to the journey, returning true and only adding the trip to
the current journey if it meets the following criteria:
• The journey’s end location is equal to the newly added BusTrip departure location (so the new
bus trip departs from the same location as journey’s current destination).
• The journey’s current destination time is earlier in the day (or equal) to the newly added
BusTrip parameter’s departure time (so the new bus trip cannot be added if it departs earlier
in the day to the journey’s current destination time).
• The journey so far does not already contain the parameters arrival location somewhere in its
busList (meaning the same location is never visited twice – no closed paths).
The containsLocation method should return true if the given location is in the journey so far. The
getter methods return the origin and destination location and dates (or null) of the Journey (so origin
location and time are the first BusTrip’s departure values, whereas the destination location and time
are the last BusTrip’s arrival values). The removeLastTrip method removes the lastly added BusTrip
from the current journey (if any). The getTotalCost returns the total cost of all the bus trips in this
journey. The getNumberOfTrips returns the number of bus trips which comprise this journey. The
toString method should print out a string representation of all trips in this journey and the total cost
in a nicely formatted way. Finally, the cloneJourney method returns a new BusJourney object,
passing its busList to the new instance by calling the second constructor.
_01000000 20089790
Part B: Using the following UML, create a class called JourneyPlanner. The class holds an efficient
Map data structure of unique locations as keys and a set of BusTrip instances which depart that
location as values. The add method is used to add a unique BusTrip to this map (care taken when the
departure location already exists in the locationMap).
The getPossibleJourneys method is used to return list of all the uniquely possible routes from the
start location and time, arriving at the end location before the end time. It does this by calling the
recursive method findPaths which uses a depth first search technique to build up all the possible
journeys that can be undertaken between the start and end location between the specified time
ranges (some journeys might not exist). It does this by building up a currentJourney parameter and
when the target location is found it will “clone” the current journey and add it to the List of
BusJourney values found for this search. Hint: Use BusJourney like a stack while recursively finding
paths.
JourneyPlanner
- locationMap : Map