Last modified on February 19, 2007, at 09:30 AM (see updates)

For bonus points, have ADT `TextAnalyzer`

calculate the Zipf distribution of the words. This is done through the following operations:

`double getZipfSlope()`

which returns the slope of the Zipf distribution of the word frequencies.`double getZipfRSquared()`

which returns the R^{2}value of the Zipf distribution of the word frequencies. (Note: The R^{2}value indicates how close the data points are to the trendline, overall. A value of 1 indicates that the data points coincide with the trendline, whereas a value of 0 indicates that the data points are scattered randomly.)

These methods should use `ZipfStatistics.class`

. Download it and save it in the same directory as your source file. It contains the following methods:

`double slope(double wordFrequencies[])`

which returns the slope of the Zipf distribution of the provided word frequencies.`double rSquared(double wordFrequencies[])`

which returns the R^{2}of the Zipf distribution of the provided word frequencies.

Download a few books from Project Gutenberg and see if their word distributions follow Zipf's law.

For control try loading `TextAnalyzer`

with random words. (*Hint:* Use `Math.random()`

to generate such words.)

In your `README.TXT`

file, include the names and URLs of the books you tried, their slopes and R^{2} values, and the slope(s) and R^{2} values of the random text(s). Also state your conclusion.

In the submitted `.jar`

file, also include the system driver(s) you developed to explore this question.

`ZipfStatistics.class`

is made available under a Creative Commons License. It was developed by Chris Wagner, Charles McCormick, and Bill Manaris.(Printable View of http://www.cs.cofc.edu/~manaris/?n=Fall2005.CSCI221Homework4Bonus)