Skip to main content.
home | support | download

Back to List Archive

SUMMARY - external links

From: Mark Greenaway <mark(at)not-real.cres20.anu.edu.au>
Date: Mon Apr 12 2004 - 22:45:46 GMT
To get swish to index external links, modify spider.pl (v 2.4.2) as follows

 881      # Here we make sure we are looking at a link pointing to the correct 
(or equivalent) host
   882
   883  #    unless ( $server->{scheme} eq $u->scheme && 
$server->{same_host_lookup}{$u->canonical->authority||''} ) {
   884  #
   885  #        print STDERR qq[ ?? <$tag $attribute="$u"> skipped because 
different host\n] if $server->{debug} & DEBUG_LINKS;
   886  #        $server->{counts}{'Off-site links'}++;
   887  #        validate_link( $server, $u, $base ) if 
$server->{validate_links};
   888  #        return;
   889  #    }
   890      
   891  #    $u->host_port( $server->{authority} );  # Force all the same host 
name
   892
   893      # Allow rejection of this URL by user function

That comment out lines 883-891

This still obeys the max_depth which is extremely important, otherwise
you could spider the world.  If you have max_depth set to more than 1
then you better know what you are doing.

Thanks to Bill Moseley (only one who seems active), but I had worked
it out myself before hand - then went on my Easter break.
Received on Mon Apr 12 15:45:47 2004