On Sat, Mar 06, 2004 at 10:20:08AM -0800, Rob de Santos AFANA wrote:
> At the site we sell videos/DVDs thru an affiliation with another web
> site. Clicking on links from out site takes you to their site, e.g. the
> link might be:
> which allows the customer to order and for us to get credit for the
> sale. However, if you are at our site and search for a DVD of a game
> between Club A and Club B you won't find anything since the details are
> external to our site. Ideally, I would like to spider the relevant
> portions of the other site
spider.pl just fetches web pages, indexes the content and extracts out
the links into a queue of other URLs to index. Extracted links pointing
to other sites are just ignored, unless they are setup as "same_hosts"
-- although that's more for mapping www.foo.com and foo.com to the same
If what you want to do is insert the content of another page into the
page being indexed then I'd probably use filter_content to scan for the
links to the other site, fetch that page or pages and extract the
content and add it into the current page being indexed.
The extracted links are not available to the filter so you would have to
extract them yourself.
Received on Sat Mar 6 22:15:02 2004