首页 > > 详细

辅导 CS 2210B - Data Structures and Algorithms

CS 2210B - Data Structures and Algorithms
Assignment 4 (Programming)
Total Marks: 100
Due Date: March 31, 2022
1 Overview
In this assignment, you will do an investigation in the area of Natural Language Processing (NLP). At
the the same time, you will practice the AVL tree data structure. You will need to use AVL trees for
two steps of the assignment. In NLP, one often needs to count how many times each particular word
occurs in a text (or collection of texts). Let r be the number of times a particular word occurs in the
text. For example, consider the following text:
“I would like to eat an apple, a banana, and a peach. I already ate a banana.”
The word “a” occurs 3 times, the word “an” occurs 1 time, etc. For language modeling, one often
needs to know how many distinct words occur exactly r times in the text. Let N (r) be the number
of distinct words that occur exactly r times. For example, consider the text above again. The word
“I” occurs 2 times, the word “banana” occurs 2 times, the word “a” occurs 3 times. The remaining
10 words occur exactly once. Therefore in this case,
N (3) = 1, since one word (“a”) occurs 3 times in the text;
N (2) = 2, since two words (“I”,“banana”) occur 2 times in the text;
N (1) = 10, since ten words (“would”, “like”, “to”, “eat”, “an”, “apple”, “and”, “peach”,
“already”, “ate”) occur once in the text.
In NLP, N (r)’s need to be computed to build language models. It is also quite interesting to look at
N (r)’s. What we find is that a few words, such as “a”, “the”, “and”, are used very often in English
(that is for large r, N (r) is small). However, about half of all words in text occur only once or twice
(that is for small r, N (r) is very large). To explain this phenomenon, a “Principle of Least Effort”
has been developed. This principle roughly states that people act so as to minimize their probable
average rate of work. Both speaker and listener are trying to minimize their effort. The speaker’s effort
is conserved by having a small vocabulary of common words, and the listener’s effort is lessened by
having a large vocabulary of rarer words (the more there are words, the less ambiguous are messages,
and it’s easier to understand them). The maximally economical compromise between these competing
needs is to have a few very common words and a large amount of rare words. In this assignment, you
are to compute efficiently N (r)’s from a given text. Your main program will be called ’calculateNr’,
and should read the text file specified by a command line argument: ’CalculateNr text.txt’. It should
output N (r)s separately on each line. For our example text above, the program’s output should list
all N (r)’s sorted by the increasing value of r:
N(1) = 10
N(2) = 2
page 2 of 6
N(3) = 1
You have to write all the code yourself. You cannot use code from the textbook, Internet, and any
other sources. You may use java build-in class for linked list java.util.LinkedList and the Iterator
interface java.util.Iterator.
2 Implementation
You are to implement program very efficiently using the AVL tree data structure. First read all the
words from the input file, using the code provided in FileWordRead.java. As you read the words,
insert them in your AVL tree, let’s call this tree T1. The word itself will be the key. The value will be
an integer, which specifies how often the word occurs in the input file. The first time you see a word,
say word ’cat’ you should create an entry (’cat’,1) and insert it into your AVL tree. The second time
you see word ’cat’, you should update the count of the entry (’cat’,1’) to be (’cat’,2’). In our example
text: ‘I would like to eat an apple, a banana, and a peach. I already ate a banana.”, after you finish
reading all the words, the AVL tree T1 should contain the following entries:
(I,2), (would,1), (like,1), (to,1), (eat,1), (an,3),
(apple,1), (banana,2), (and,1), (peach,1), (already,1),
(ate,1).
After this first step, you know how often each word occurs in the input file. Now you need to count
how many words occur exactly 1 time, 2 times, etc. The efficient way of doing this is with another
AVL tree. You construct a new AVL tree, let’s call it T2. The entries in T2 are different from entries
in T1, since the purpose of T2 is different. In T2, the key is the word rate r. Notice that r is in the
value field of the first tree T1. The value field of an entry in T2 is the count of how many words have
the rate r.
Now we are ready to construct T2 from T1. Delete entries from T1 until T1 is empty. Recall that each
entry in T1 is of form (“word”, r), where r is how often the “word” occurs in the text file. For each
deleted entry (“word”, r), if the r is not present in T2 yet, insert a new (r, 1) entry in T2. If key r is
already present in T2, say we have an entry (r, count) in T2, update this entry to (r, count+ 1), since
you found one more word that occurs exactly r times in the input file. After you are done inserting
entries into T2, for any entry (r, c) in T2 c is the count of how many words occur in text exactly r
times. For our example text above:
”I would like to eat an apple, a banana, and a peach. I already ate a banana.”
after you are done reinserting entries from T1 into T2, T2 should contain the following entries:
(1,10), (2,2), (3,1).
When you are done inserting into T2, simply go over the tree in inorder traversal and print out the key
and value fields to the standard output. Notice that the keys in T1 are of type String, but the keys in
T2 are of type Integer. We will use Java Generics for both the keys and the value fields of the Entry
Class.
page 3 of 6
3 Classes Provided
3.1 FileWordRead
This is a program for reading words from file. I wanted to make sure you read exactly the same words
from file as my test program, so you have to use this program. You need the following methods from
this class:
FileWordRead(String name): This is a constructor which takes the name of the file to read
tokens from.
public Iterator getIterator(): Returns iterator over words read from the input
file. Here is how to use the iterator:
FileWordRead words = new FileWordRead(fileName);
Iterator it = words.getIterator(); // grab the iterator into variable it
while (it.hasNext()) { // Check if anything is left in the iterator
String next = it.next(); // get the next item in the iterator
....
3.2 AVLTree
This class implements various AVL operations and is a main supporting file to start with. It has
provided implementation details that would be helpful in presenting the solution. The description of
the methods are found in section 4. The provided methods are mentioned as follows,
/* Function to check if tree is empty */
public boolean isEmpty()
/* Make the tree logically empty */
public void makeEmpty()
/* Function to insert data */
public void insert(int data)
/* Function to get height of node */
private int height(AVLNode t )
/* Function to max of left/right node */
private int max(int lhs, int rhs)
/* Function to insert data recursively */
private AVLNode insert(int x, AVLNode t)
/* Rotate binary tree node with left child */
private AVLNode rotateWithLeftChild(AVLNode k2)
/* Rotate binary tree node with right child */
private AVLNode rotateWithRightChild(AVLNode k1)
page 4 of 6
/* Double rotate binary tree node: first left child * with its right child; then node k3 with new
left child */
private AVLNode doubleWithLeftChild(AVLNode k3)
/* Double rotate binary tree node: first right child * with its left child; then node k1 with new
right child */
private AVLNode doubleWithRightChild(AVLNode k1)
/* Functions to count number of nodes */
public int countNodes()
/* Functions to search for an element */
public boolean search(int val)
/* Function for inorder traversal */
public void inorder()
/* Function for preorder traversal */
public void preorder()
/* Function for postorder traversal */
public void postorder()
3.3 Test
This is a test program for your tree implementation. Compile and run Test.java with your tree
implementation. It has 11 tests, and will tell you whether your implementation passes/fails these
tests. First 10 tests are worth 4 points each, test 11 is worth 10 points, it’s most expensive since it
checks if your tree is a valid AVL tree.
3.4 AVLNode
The AVL Node class implements the structure of node. You will also need it in the AVLTree class
whose partial implementation is provided to you. The detail implementation of AVL Node is also
provided in the file mentioned in section 3.2.
4 Classes to Implement
With reference to provided code, assembly of code is required to create a logical output as elicited in
Test.java.
4.1 AVLTree
This class will extend the implementation of provided AVLTree class. For this class, you must imple-
ment all and only the following public methods in addition to the methods already implemented as
provided in the AVLTree.java:
page 5 of 6
public isBST(t1): This method will return true if the provided tree is BST.
public isBalanced(t1): This method will return true if given BST tree is also AVL.
You can implement any other methods that you want to in this class, but they must be declared as
private methods.
4.2 AVLTreeException
This is the class for AVL tree exception.
4.3 calculateNr
This class contains the main method, declared with the usual method header: public static void
main(String[] args) You can implement any other methods that you want in this class, but they
must be declared as private methods.
In your class AVLTree, you must have a private object of type
5 Coding Style
Your mark will be based partly on your coding style.
Variable and method names should be chosen to reflect their purpose in the program.
Comments, indenting, and whitespace should be used to improve readability.
No variable declarations should appear outside methods (“instance variables”) unless they con-
tain data which is to be maintained in the object from call to call. In other words, variables
which are needed only inside methods, whose value does not have to be remembered until the
next method call, should be declared inside those methods.
6 AVL Tree Implementation vs. Binary Search Tree
If you cannot get the AVL tree to re-balance correctly, you can implement a binary search tree(in which
case you should still have all the classes/method names as specified). The only difference between the
BST and the AVL tree is that BST does not have to be balanced, you do not have to implement
triNodeRestructuring for BST. You will loose 10 marks on Test 11 if you implement BST instead of
AVL tree.
6.1 Grading
Your grade will be computed as follows.
Program compiles, produces a meaningful output: 15 marks
page 6 of 6
Test pass: 50 marks. First 10 tests are worth 4 marks, Test 11 is worth 10 marks.
Coding Style: 20 marks
calculateNr program implementation: 15 marks.
7 Important Note
To be eligible for full marks, your programming assignment must run on the departmental com-
puting equipment. You may develop assignments on your home computer, but you must allow
for the amount of time it will take to get the final programs working on Computer Science’s
machines. All programming assignments must be submitted through OWL.
? The late penalty for programming assignment is [2.5i] (2.5 to the i-th power, rounded to the
nearest integer), where i > 0 is the number of days you are late. So if you hand in your
assignment 1 day late, you will be penalized 3%, a delay of 2 days will decrease your grade by
6%, 3 days is penalized 16% and 4 days takes 39% off your grade. You cannot be more than 4
days late.

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!