Last modified on February 19, 2007, at 09:30 AM (see updates)
For bonus points, have ADT
TextAnalyzer calculate the Zipf distribution of the words. This is done through the following operations:
which returns the slope of the Zipf distribution of the word frequencies.
which returns the R2 value of the Zipf distribution of the word frequencies. (Note: The R2 value indicates how close the data points are to the trendline, overall. A value of 1 indicates that the data points coincide with the trendline, whereas a value of 0 indicates that the data points are scattered randomly.)
These methods should use
ZipfStatistics.class. Download it and save it in the same directory as your source file. It contains the following methods:
double slope(double wordFrequencies)
which returns the slope of the Zipf distribution of the provided word frequencies.
double rSquared(double wordFrequencies)
which returns the R2 of the Zipf distribution of the provided word frequencies.
Download a few books from Project Gutenberg and see if their word distributions follow Zipf's law.
For control try loading
TextAnalyzer with random words. (Hint: Use
Math.random() to generate such words.)
README.TXT file, include the names and URLs of the books you tried, their slopes and R2 values, and the slope(s) and R2 values of the random text(s). Also state your conclusion.
In the submitted
.jar file, also include the system driver(s) you developed to explore this question.