Skip to main content.
home | support | download

Back to List Archive

Re: external spider

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Apr 08 2004 - 10:48:27 GMT
On Wed, Apr 07, 2004 at 10:28:07PM -0700, Mark Greenaway wrote:
> OK I am not that familar with perl
> does anyone habe a modified copy of spider.pl or swishspider that allows
> swish-e to index off-site of external links as well as local ones.

Well, this looks like the code that checks for a matching host name:

    # Here we make sure we are looking at a link pointing to the correct (or equivalent) host

    unless ( $server->{scheme} eq $u->scheme && $server->{same_host_lookup}{$u->canonical->authority||''} ) {

        print STDERR qq[ ?? <$tag $attribute="$u"> skipped because different host\n] if $server->{debug} & DEBUG_LINKS;
        $server->{counts}{'Off-site links'}++;
        validate_link( $server, $u, $base ) if $server->{validate_links};
        return;
    }

    $u->host_port( $server->{authority} );  # Force all the same host name

so you could try removing that code from a copy of spider.pl.  
Then hope max_depth works right.



-- 
Bill Moseley
moseley@hank.org
Received on Thu Apr 8 03:48:28 2004