Bill Moseley wrote:
> spider.pl just fetches web pages, indexes the content and extracts out
> the links into a queue of other URLs to index. Extracted links
> to other sites are just ignored, unless they are setup as "same_hosts"
> -- although that's more for mapping www.foo.com and foo.com to the
> host name.
OK, understood. Any reason why I couldn't map www.othersite.com/video/
to my host? Particularly if I set up redirection in .htaccess on my
site so that www.afana.com/video/ sent users to the other site's pages?
> If what you want to do is insert the content of another page
> into the page being indexed then I'd probably use
> filter_content to scan for the links to the other site, fetch
> that page or pages and extract the content and add it into
> the current page being indexed.
No, not really what I had in mind, though it *might* work. I'm waiting
to hear from the other site's web guru to see how his pages are
structured. If they are "dynamic", e.g. regenerated when needed that
might complicate this.
> The extracted links are not available to the filter so you
> would have to extract them yourself.
Shouldn't be that hard, if needed. Redirection seems simpler though.
I'm satisfied if I can simply include the appropriate subset of pages
from the other site in my index at this stage.
Received on Sun Mar 7 05:48:17 2004