Skip to main content.
home | support | download

Back to List Archive

Re: Adding files from external site - suggestions?

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Sun Mar 07 2004 - 06:14:57 GMT
On Sat, Mar 06, 2004 at 10:20:08AM -0800, Rob de Santos AFANA wrote:

> At the site we sell videos/DVDs thru an affiliation with another web
> site.  Clicking on links from out site takes you to their site, e.g. the
> link might be:
> 
> http://www.othersite.com/cgi-bin/at.pl?a=123456 
> 
> which allows the customer to order and for us to get credit for the
> sale.  However, if you are at our site and search for a DVD of a game
> between Club A and Club B you won't find anything since the details are
> external to our site.  Ideally, I would like to spider the relevant
> portions of the other site

spider.pl just fetches web pages, indexes the content and extracts out 
the links into a queue of other URLs to index.  Extracted links pointing 
to other sites are just ignored, unless they are setup as "same_hosts" 
-- although that's more for mapping www.foo.com and foo.com to the same 
host name.

If what you want to do is insert the content of another page into the
page being indexed then I'd probably use filter_content to scan for the
links to the other site, fetch that page or pages and extract the
content and add it into the current page being indexed.

The extracted links are not available to the filter so you would have to 
extract them yourself.


-- 
Bill Moseley
moseley@hank.org
Received on Sat Mar 6 22:15:02 2004