Assigned Date: Tuesday, Mar. 17, 2009
Due Date: Friday, Mar. 27
Due Time: 11:55pm
Last modified on April 01, 2009, at 05:03 PM (see updates)
This assignment focuses on:
This is a pair-programming assignment (i.e., you may work with a partner). You may discuss the assignment only with your partner or the instructor.
Write an intelligent agent that searches a website (e.g., the NY Times, or the Washington Post, etc.) to harvest information of interest. The agent will be given an amount of time to search for webpages (target pages) within this website which contain specific keywords. When the time alloted is over (or when the complete website is searched, whichever comes first), the agent will output a histogram of words found on the target pages (if any).
The agent should prompt for the following (on separate lines):
Upon completion of the allotted time (or having exhaustively searched the website), the agent outputs:
The agent should not revisit a page (i.e., should skip already visited URLs). The agent should contain its search within the site (i.e., should skip external links). Keywords are conjunctive (never "OR", always "AND").
See sample Python code for processing webpages, MagnatuneDownloader.py.
For information on how to time your code, see function
time() in Python module time.
Experiment with at least three different uninformed search algorithms (e.g., depth-first, breadth-first, iterative deepening, etc.) to see which one performs better.
When constructing the histogram ignore common (stop) words.
All programs that you complete in your career as a student and as a professional developer should be fully documented. Obviously, you should comment any variable, obscure statement, block of code, method, and class you create. Your comments should express why something is being done, as opposed to how – the how is shown by the code. Also, include opening comments as specified in previous assignment.
Your grade will be based on how well you followed the above instructions, and the depth/quality of your work.