matt - 2006-09-21 01:31:16
I'm trying to get the spiderClass class to work, however i'm having a few issues. When i spider a site, i get values returned that have the root slash removed:
site.domain.compage/page2/
also, when i start with a site that has a page in it, like:
site.domain.com/page1/
pages within that directory are returned without the original directory:
site.domain.com/page1/page.htm, page2.htm, return as:
site.domain.com/page.htm
site.domain.com/page2.htm
my regular expression is the site: "/http\:\/\/site\.domain\.com\/page\//"
lastly, when i looked in the second problem, i found that the term "page.htm" is being sent to the parse_url() function. should it be sending the full page?
thanks,