Bill Manaris : Fall 2005 / CSCI 221 Homework 4 Bonus

Last modified on February 19, 2007, at 09:30 AM (see updates)


For bonus points, have ADT TextAnalyzer calculate the Zipf distribution of the words. This is done through the following operations:

These methods should use ZipfStatistics.class. Download it and save it in the same directory as your source file. It contains the following methods:


Download a few books from Project Gutenberg and see if their word distributions follow Zipf's law.

For control try loading TextAnalyzer with random words. (Hint: Use Math.random() to generate such words.)


In your README.TXT file, include the names and URLs of the books you tried, their slopes and R2 values, and the slope(s) and R2 values of the random text(s). Also state your conclusion.

In the submitted .jar file, also include the system driver(s) you developed to explore this question.


ZipfStatistics.class is made available under a Creative Commons License. It was developed by Chris Wagner, Charles McCormick, and Bill Manaris.
(Printable View of