PHP Classes

several issues

Recommend this page to a friend!

      Spider Class  >  All threads  >  several issues  >  (Un) Subscribe thread alerts  
Subject:several issues
Summary:returned links are malformed
Messages:1
Author:matt
Date:2006-09-21 01:31:16
 

  1. several issues   Reply   Report abuse  
Picture of matt matt - 2006-09-21 01:31:16
I'm trying to get the spiderClass class to work, however i'm having a few issues. When i spider a site, i get values returned that have the root slash removed:

site.domain.compage/page2/

also, when i start with a site that has a page in it, like:

site.domain.com/page1/

pages within that directory are returned without the original directory:

site.domain.com/page1/page.htm, page2.htm, return as:

site.domain.com/page.htm
site.domain.com/page2.htm

my regular expression is the site: "/http\:\/\/site\.domain\.com\/page\//"

lastly, when i looked in the second problem, i found that the term "page.htm" is being sent to the parse_url() function. should it be sending the full page?

thanks,