PHP Classes

File: readme.txt

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in Bookmark in
  Classes of Nadir Latif  >  Link Searcher  >  readme.txt  >  Download  
File: readme.txt
Role: Documentation
Content type: text/plain
Description: help file
Class: Link Searcher
Crawl Web pages to search for given text
Author: By
Last change: Initial commit
Date: 1 year ago
Size: 2,348 bytes


Class file image Download
Made by: Nadir Latif (

Dependencies: None.

This script is a web crawler that allows users to search for text inside web pages using regular expressions. The crawler starts from a page and does a breadth first search of all links that it finds on the page. The user can specify the depth to which the crawler will run. The text to search for is specified using regular expressions. The user can also optionally specify the link that should be searched. e.g links called "Next". e.g a search performed on a web site can return many pages of results. If the user wants to know which pages contains a certain text, he can use this application instead of manually clicking on the "Next" link. The script can easily be extended to process pages in any way.

1) Usage:

-Copy the files to the directory of a web server and run index.php. Enter the following :

   - In Enter URL field enter the url of the page from which the search should begin. (e.g
   - In Enter Regex field enter a regular expression that should match. The regular expression can simply be a text string that should occur in a page. (e.g Iran).
   - In Enter Link to Search field enter the text of the hyperlink that should be used to go to the next page. The is the text between the <a> tag (e.g Next &gt;).
   - In Enter Search Depth field enter the level to which the search should be carried out.

2) What does this script do?

The script initially retrieves the specified page. It then parses out all hyper links on the page. Those links that have the text specified by the user (in the "Enter Link to Search" text box) are placed in a FIFO queue. If no link text is specified then all links are placed in the queue. The first link in the queue is then retrieved. The page is downloaded and its content matched against the regular expression entered by the user. If there is a match, a link to that page is displayed on the browser. All links (or those that match the link text) in the downloaded page are placed in the queue. This process in repeated until the specified depth is reached.

3)List of files:

a)index.php (initial file)
b)link_searcher.php (main program file)
c)queue.php (used to store the links in a page)
d)readme.txt (help file)

-Feel free to contact me for any assistance regarding this script.