PHP Classes

File: docs.html

Recommend this page to a friend!
  Classes of Radu Topala   Spider Engine   docs.html   Download  
File: docs.html
Role: Documentation
Content type: text/plain
Description: documentation
Class: Spider Engine
Retrieve and process remote HTML pages
Author: By
Last change: -
Date: 17 years ago
Size: 1,417 bytes
 

Contents

Class file image Download
/** * About author: * Radu T. * email: eagle[not]bv[not]ro[[not][isat][not]]yahoo[[not][isdot][not]]com * * About class: * SpiderEngine v.1.1.0 class for spidering any html page * -fetchData() - for reading the content of a html page * -processData() - for doing whatever you want to the results * * -url - url to read from eg. http://www.home.com/page_no_<range[0]>.html * -range - array for range of action on url eg. array(0=>array("start"=>1,"end"=>10,"step"=>1)) - that means: for(i=1;i<=10;i+=1) * -pattern - the html text containing the pattern_definition and text * -pattern_definition - array definition names eg. array("dummy","cat","subcat") * -start - from where the spider reads the content of the page * -end - array of "to_process" and "not_to_process" content, if a text from array "to_process" was found in content page then the data is spidered and is called processData(), if a text from array "not_to_process" was found in content page then just show a message * * -pattern definition example: {p[abc]}, {p[1]},{p[#]},{p[no.1]} etc. * -pattern can be found in the same page multiple times */ Because I saw MineTheWeb (which appears to be the most used commercial spider) and I didn't like it at all, I started working on a general spider for html that uses the logic of MineTheWeb, but with improvements ! And FREE for use!