PHP Classes
elePHPant
Icontem

File: documentation.txt

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us
  Classes of Jacek Lukasiewicz  >  Web scraper  >  documentation.txt  >  Download  
File: documentation.txt
Role: Documentation
Content type: text/plain
Description: documentation
Class: Web scraper
Extract information from Web site pages
Author: By
Last change:
Date: 3 years ago
Size: 1,178 bytes
 

Contents

Class file image Download
This class allows you to get data from any site.
The data are taken from defined locations in the DOM structure.
Data points are defined using the phpquery notation  - similar to the selectors used in JQuery library.

This class can fetch data in three different modes by:
* scanning a single page
* scanning a "from->to" range of pages matching defined URL schema
* scanning a list of URLs retrieved from a PHP array  
  
EXAMPLE  

	$scrap = new Scraper();
	
	//set base url with token named ##TOKEN##. 
	$scrap->setBaseUrl('http://your.site.ccm/path/to/details.html?id=##TOKEN##');
	
	//Set the scan range for the token
	//##TOKEN## will be replaced by from the scope of id
	$scrap->addRangeScanRule(151598039, 151598042, '##TOKEN##');
	
	//definition of points where data are 
	$scrap->addDataTarget('name', '.headline .margin h1');
	$scrap->addDataTarget('price', '#buyerpricegross');
	$scrap->addDataTarget('image', '#imageWrapper #thumbnailoverlay a');
	
	$data = $scrap->process();
	//$data has array structure:
		 array(
			array('name' => ...., 
			array('price' => ....,
			array('image' => ....,
		),
		....
		....
		....