Skip to main content.
home | support | download

Back to List Archive

Re: external spider

From: Bill Moseley <moseley(at)>
Date: Thu Apr 08 2004 - 10:48:27 GMT
On Wed, Apr 07, 2004 at 10:28:07PM -0700, Mark Greenaway wrote:
> OK I am not that familar with perl
> does anyone habe a modified copy of or swishspider that allows
> swish-e to index off-site of external links as well as local ones.

Well, this looks like the code that checks for a matching host name:

    # Here we make sure we are looking at a link pointing to the correct (or equivalent) host

    unless ( $server->{scheme} eq $u->scheme && $server->{same_host_lookup}{$u->canonical->authority||''} ) {

        print STDERR qq[ ?? <$tag $attribute="$u"> skipped because different host\n] if $server->{debug} & DEBUG_LINKS;
        $server->{counts}{'Off-site links'}++;
        validate_link( $server, $u, $base ) if $server->{validate_links};

    $u->host_port( $server->{authority} );  # Force all the same host name

so you could try removing that code from a copy of  
Then hope max_depth works right.

Bill Moseley
Received on Thu Apr 8 03:48:28 2004