This class can be used to generate sitemaps and notify updates to search engines.
It can build a sitemap file from a list of URLs. The URLs may have attached the last modification date, a change frequency and a priority. The sitemap file may be saved in the compressed format.
The class may also update the site robots.txt file with the sitemap address.
When the sitemap is updated, the class can also notify search engines like Google, Bing, Yahoo and Ask.
A simple yet fully extensible MySQL database management class. Can be easily used to extend other classes requiring database access. Example usage is included. See the file readme_class.DB.txt for information on:
Test if a URL may be crawled looking at robots.txt
This class can be used to check whether a page may be crawled by looking at the robots.txt file of its site.
It takes the URL of a page and retrieves the robots.txt file of the same site.
The class parses the robots.txt file and looks up for the rules defined in that file to see if the site allows crawling the intended page.
The class also stores the time when a page is crawled to check whether next time another page of the same site is being crawled it is honoring the intended crawl delay and request rate limits.