Login   Register  
PHP Classes
elePHPant
Icontem

Robots_txt: Test if a URL may be crawled looking at robots.txt

Recommend this page to a friend!
Stumble It! Stumble It! Bookmark in del.icio.us Bookmark in del.icio.us
  Info   View files View files (2)   DownloadInstall with Composer Download .zip   Reputation   Support forum (1)   Blog    
Last Updated Ratings Unique User Downloads Download Rankings  
2008-03-04 (6 years ago) RSS 2.0 feedStarStar 36%Total: 1,293 This week: 1All time: 2,770 This week: 1,398Up
Version License PHP version Categories  
robots_txt 1.1GNU General Public Lice...5.0PHP 5, Searching
Description Author  

This class can be used to check whether a page may be crawled by looking at the robots.txt file of its site.

It takes the URL of a page and retrieves the robots.txt file of the same site.

The class parses the robots.txt file and looks up for the rules defined in that file to see if the site allows crawling the intended page.

The class also stores the time when a page is crawled to check whether next time another page of the same site is being crawled it is honoring the intended crawl delay and request rate limits.

Innovation Award  
PHP Programming Innovation award nominee
January 2008
Number 8
robots.txt is a file that sites need to have in their domain Web root to tell search engine crawlers and Web robots in general which pages should not be crawled.

This class can parse a robots.txt file of a domain to determine whether a given page should be crawled or not.

It is useful to implement a friendly crawler which respects the wishes of site owners that do not want to have certain pages crawled by Web robot programs.

Manuel Lemos
Picture of Andy Pieters
Name: Andy Pieters <contact>
Classes: 1 package by
Country: United Kingdom United Kingdom
Age: 37
All time rank: 184579 in United Kingdom United Kingdom
Week rank: 1078 Up48 in United Kingdom United Kingdom Up
Innovation award
Innovation award
Nominee: 1x

  Files folder image Files  
File Role Description
Plain text file Robots.txt.class.php Class Core file
Accessible without login Plain text file README.txt Doc. Usage Examples

 Version Control Unique User Downloads Download Rankings  
 0%Total:1,293All time:2,770
 This week:1This week:1,398Up
User Ratings User Comments (1)  
 All time
Utility:50%StarStarStar
Consistency:62%StarStarStarStar
Documentation:50%StarStarStar
Examples:-
Tests:-
Videos:-
Overall:36%StarStar
Rank:2275
 
Says not allowed also if it is: http://www.
3 years ago (Ivan Spadacenta)
10%Star