Skip to main content.
home | support | download

Back to List Archive

Re: Adding files from external site - suggestions?

From: Rob de Santos AFANA <rdesantos(at)not-real.afana.com>
Date: Sun Mar 07 2004 - 13:48:07 GMT
Bill Moseley wrote: 
> spider.pl just fetches web pages, indexes the content and extracts out

> the links into a queue of other URLs to index.  Extracted links
pointing 
> to other sites are just ignored, unless they are setup as "same_hosts"

> -- although that's more for mapping www.foo.com and foo.com to the
same 
> host name.

OK, understood.  Any reason why I couldn't map www.othersite.com/video/
to my host?  Particularly if I set up redirection in .htaccess on my
site so that www.afana.com/video/ sent users to the other site's pages?

> If what you want to do is insert the content of another page 
> into the page being indexed then I'd probably use 
> filter_content to scan for the links to the other site, fetch 
> that page or pages and extract the content and add it into 
> the current page being indexed.

No, not really what I had in mind, though it *might* work.  I'm waiting
to hear from the other site's web guru to see how his pages are
structured.  If they are "dynamic", e.g. regenerated when needed that
might complicate this. 
 
> The extracted links are not available to the filter so you 
> would have to extract them yourself.

Shouldn't be that hard, if needed.  Redirection seems simpler though.
I'm satisfied if I can simply include the appropriate subset of pages
from the other site in my index at this stage. 

Regards, 

-Rob
http://www.afana.com
Received on Sun Mar 7 05:48:17 2004