HOMEWORK ASSIGNMENT #4
Assigned Date: Monday,
April 11, 2005 (sec 2 +1 day)
Due Date: Monday, April 25, 2005 (sec 2 +1 day)
This assignment focuses on linked lists.
Implement the TextAnalyzer
ADT in Java. This ADT encapsulates a
list of items stored using a sorted doubly-linked list with dummy head and tail
nodes. The list contains unique words sorted lexicographically (case
insensitive) and corresponding counts.
Adding a duplicate word simply increments the corresponding count. Deleting a word, decrements the
corresponding count. Deleting the last
instance of a word, completely removes that word (and corresponding count) from
In terms of implementation, TextAnalyzer
objects should have three Node
references, head, tail, and currentPosition.
There should NOT be a numItems
(or similar) variable, i.e., the length of the list will be calculated
dynamically whenever it is needed.
should be implemented as an internal class.
It should encapsulate a word
and a count, as well as a prev and next link.
has the following API:
- public TextAnalyzer(), which
creates an empty list, i.e., a list with two dummy head and tail Node objects linked
- public void add(String word),
which adds word into the list, as described above.
- public void remove(String word),
which removes word from the list, as described above.
- public int getFrequency(String word),
which returns the count of the word, or zero if the word is not in the
list. (Note: This should not be
confused with the meaning of frequency in Physics.)
- public double getProbability(String
word), which returns the probability of the word to appear in the
text, or zero if the word is not in the list. The probability of a word is defined as its count divided by
the total number of words -- that’s the total number of words added to the
list, NOT the number of unique words.
(Another term for this is relative frequency.)
- public void reset(), which resets
the current position to the beginning of the list.
- public String getNextWord(),
which returns the current word and advances the current position. It returns null if current position is
past the end of the list.
- public int getUniqueWordCount(),
which returns the number of unique words in the list.
- public int getAllWordCount(),
which returns the number of items added to the list (including duplicate
In addition to the above, ADT TextAnalyzer
provides the following operations:
- double getZipfSlope(), which
returns the slope of the Zipf distribution of the word frequencies.
- double getZipfRSquared(), which
returns the R2 value of the Zipf distribution of the word
frequencies. (Note: The R2
value indicates how close are the data points overall to the
trendline. A value of 1 indicates
that the data points coincide with the trendline, whereas a value of 0
indicates that the data points are scattered randomly.)
To calculate the Zipfian distribution use ZipfStatistics.class. Download it and save it in the same
directory as your source file. It
contains the following methods:
- double slope(double wordFrequencies),
which returns the slope of the Zipf distribution of the provided word
- double rSquared(double
wordFrequencies), which returns the R2 of the Zipf
distribution of the provided word frequencies.
ZipfStatistics.class is made available under
a Creative Commons
License. It was developed by Chris Wagner, Charles McCormick, and
See first assignment.
Also you should submit a javadoc API in HTML for your ADT. Your code should be fully documented.
Two options (use either one – the effect is the same):
your BlueJ project. Under the Project
menu, click Create Jar File… .
In the dialog box that opens, select Include Source, and
press Continue. Email the
generated .jar file to email@example.com, by the due date
option is available in BlueJ 2.0 and above) Save submission.defs
into your BlueJ project directory.
Open your BlueJ project.
Under the Tools menu, click Submit… . In the dialog box that opens, select
scheme CSCI 221/hmwk4 and press Submit. (You may have to specify your email
should modularize and document your code thoroughly. Your methods should
be fully documented, i.e., purpose, and pre/postconditions. Each Java file should have a
certificate of authenticity, as per first homework.