CSCI 220
HOMEWORK ASSIGNMENT #3
Assigned Date: Tuesday, February 24, 2004
Due Date: Monday, March 1, 2004
Due Time: Noon

 

Source filename to be submitted:  TextStatistics.java, TextStatistics.txt

Skills Developed: Selection and Iteration structures.

Documentation and submission:  See instructions in the first homework assignment.

Background:

The availability of computers with text manipulation capabilities has resulted in some rather interesting approaches to analyzing the writings of great authors. Much attention has been focused on whether William Shakespeare ever lived. Some scholars believe there is substantial evidence indicating that Christopher Marlowe actually penned the masterpieces attributed to Shakespeare. Researchers have used computers to find similarities in the writings of these two authors, as well as other authors.

Assignment: 

Your assignment is to write a program that reads several lines of text and prints a table indicating the number of occurrences (histogram) of each letter of the alphabet in the text. For example, the phrase:

      "To be, or not to be: that is the question:"

contains one a, two b's, no c's, ..., seven t's, etc.

In addition to the above, your program must output the relative frequency of each letter. The relative frequency of a letter is calculated by dividing the number of occurrences of this letter in the text by the total number of letters in the text.

Create a text file called TextStatistics.txt. In this file, discuss your conclusions regarding your program's potential in authorship attribution. In other words, is it possible to tell the difference between works written by two different authors simply based on your program's output? (one paragraph) Discuss how this program could be improved  (another paragraph).

Notes:

  1. Since Chapman's StdIn class cannot be used easily to read a single character at a time, we will use the standard Java input stream System.in and the read() method.  We will discuss the use of this stream and method in class.
  2. You may not use arrays (a Chapter 5 topic) in this assignment.
  3. To have your program accept input from a file, rather than default to the keyboard, you can use redirection. For example:
     
       java TextStatistics < someFile.txt
     
    will read from a file in the current directory called someFile.txt rather than waiting for the user to type input at the keyboard.
  4. Test your program with different inputs to ensure that it works properly.
  5. Once you are satisfied that your program works, run your program against the files
    · Shakespeare-King-Lear.txt
    · Shakespeare-Macbeth.txt
    · Shakespeare-Othello.txt
    · Shakespeare-Romeo-and-Juliet.txt
     
    Then run your program against the files
    · Mark-Twain-A-Tramp-Abroad.txt
    · Mark-Twain-The-Tragedy-of-Puddnhead-Wilson.txt
    · Mark-Twain-Tom-Sawyer-Abroad.txt
    · Mark-Twain-Tom-Sawyer-Detective.txt

 

 

Credits

Adapted from Deitel and Deitel (1949), “C – How to Program”, 2nd ed., p. 359.