PHP Classes
Icontem

Class: Robots_txt


  Search   All class groups All class groups   Latest entries Latest entries   Top 10 charts Top 10 charts   Newsletter Newsletter   Blog Blog   Forums Forums   Help FAQ Help FAQ  
  Login   Register  
Recommend this page to a friend! ReTweet ReTweet Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us
  Classes of Andy Pieters  >  Robots_txt  >  Download  >  Support forum Support forum  >  Blog Blog  >  RSS 1.0 feed RSS 2.0 feed Latest changes  
Name: Robots_txt Support forum
Base name: robots_txt
Description: Test if a URL may be crawled looking at robots.txt
Related top rated classes: , , ,
Version: 1.1
Required PHP version: 5.0
License: GNU General Public License (GPL)
All time users: 867 users
All time rank: 2720
Week users: 6 users
Week rank: 1317
 
  Author   Group folder image Groups   Detailed description  
  Rate classes User ratings   Applications   Files Files  

Author

Picture of Andy Pieters
Name: Andy Pieters <e-mail contact>
Published packages: 1 Browse this author's classes Browse this author's classes
Country: United Kingdom United Kingdom - PHP jobs in United Kingdom
Home page: ???
Age: 32
All time rank: 1623
Week rank: 955

Innovation Award

PHP Programming Innovation award nominee
January 2008
Number 8
robots.txt is a file that sites need to have in their domain Web root to tell search engine crawlers and Web robots in general which pages should not be crawled.

This class can parse a robots.txt file of a domain to determine whether a given page should be crawled or not.

It is useful to implement a friendly crawler which respects the wishes of site owners that do not want to have certain pages crawled by Web robot programs.

Manuel Lemos

Groups

Group folder image PHP 5 Classes using PHP 5 specific features View top rated classes
Group folder image Searching Search engines, crawling and indexing View top rated classes

Detailed description

This class can be used to check whether a page may be crawled by looking at the robots.txt file of its site.

It takes the URL of a page and retrieves the robots.txt file of the same site.

The class parses the robots.txt file and looks up for the rules defined in that file to see if the site allows crawling the intended page.

The class also stores the time when a page is crawled to check whether next time another page of the same site is being crawled it is honoring the intended crawl delay and request rate limits.

User ratings

Ratings
Utility
Consistency
Documentation
Examples
Tests
Videos
Overall
Rank
All time:
Sufficient (66.7%)
Sufficient (66.7%)
Not sure (50.0%)
-
-
-
Not sure (45.0%)
1555
Month:
Not yet rated by the users

Applications that use this class

No application links were specified for this class.
Add link image If you know an application of this package, send a message to the author to add a link here.

Files

File Role Description
Plain text file Robots.txt.class.php Class Core file
Accessible without login Plain text file README.txt Doc. Usage Examples
Download all files: robots_txt.tar.gz robots_txt.zip
NOTICE: if you are using a download manager program like 'GetRight', please Login before trying to download this archive.

 
  Advertise on this site Advertise on this site   Site map Site map   Statistics Statistics   Site tips Site tips   Privacy policy Privacy policy   Contact Contact  

For more information send a message to :
info at phpclasses dot org.
Copyright (c) Icontem 1999-2009 PHP Classes - PHP Class Scripts
  PHP Book Reviews - Reviews of books and other products