Assigned Date: Thursday, Mar. 4, 2010
Due Date: Friday, Mar. 26
Due Time: 11:55pm
Last modified on March 25, 2010, at 01:47 PM (see updates)
This assignment focuses on:
This is a pair-programming assignment (i.e., you may work with a partner). You may discuss the assignment only with your partner or the instructor.
The agent will be given an amount of time to search this website for webpages (target pages) containing specific keywords. When the time allotted is over (or when the complete website is searched, whichever comes first), the agent will make available a histogram of words found on the target pages (if any).
The attached code, agentsModel.py, is adapted from code provided with Russell and Norvig, "AI - A Modern Approach". It implement Agents and Environments (chapter 2). Read it and understand how it models various AI agents and environments. (This code uses utils.py. Save them both in the same directory.)
Adapt the provided code to the given problem. Remove ALL unnecessary code.
Environment subclass, called
WebEnvironment, suited to the problem.
Agent subclasses (webcrawlers) suited to the problem, each with a different search strategy. Each agent should remember:
The environment, when instantiated, should prompt the user for the following (on separate lines):
Then, the environment will instantiate three agents (see above) and have them perform the assigned task each in their own way. Each agent is allotted the complete available time. Each agent keeps track of the time it has used. When the time elapses, the agent returns
False via function
is_alive() (when asked).
An agent's percept is constructed by the environment. It contains the HTML contents of a page (initially, the target website's first page).
An agent's action is a URL (meaning, "this is the webpage I want to visit next"). The environment opens this URL and constructs the agent's next percept.
Each agent should update its performance measure based on how many target pages it has discovered (+1 point per page).
Upon completion of the task, the environment should output for each agent:
Agents should not revisit a page (i.e., should skip already visited URLs). Agents should contain their search within the site (i.e., should skip external links).
Keywords are conjunctive (never "OR", always "AND"). They should be given to an agent when instantiated. I.e., an agent is "hardwired" to look only for those keywords.
See sample Python code for processing webpages, MagnatuneDownloader.py.
Another possibility is BeautifulSoup - a simple, yet powerful Python HTML/XML parser designed for quick turnaround projects.
For information on how to time your code, see function
time() in Python module time.
Experiment with at least three different uninformed search algorithms (e.g., depth-first, breadth-first, iterative deepening, etc.) to see which one performs better.
When updating the histogram ignore common (stop) words.
Avoid global variables. Avoid side-effects. (Of course, this excludes access to an object's own instance and class variables.)
See previous assignment.
Your grade will be based on the elegance of your design, the quality of your documentation, and how well you followed the above instructions. In particular, there are many ways to achieve the above functionality. Aim for simplicity and elegance. Design, design, design... before you implement.