Skip to main content.
home | support | download

Back to List Archive

If someone is interested...

From: Antonio Cisternino <cisterni(at)not-real.Di.Unipi.IT>
Date: Wed May 26 1999 - 17:51:26 GMT
For the moment there is no way to select positive or negative URL when
swish is working on HTTP. Thus I've added the following lines to swishspider
to exploit the pattern matching of perl and allow a little kind of
URL selection.

my $posPattern = $ENV{SWISH_POS};
my $negPattern = $ENV{SWISH_NEG};

if (($posPattern && ($ARGV[1] !~ /$posPattern/)) || 
    ($negPattern && ($ARGV[1] =~ /$negPattern/))) {
    open(OUT, ">$ARGV[0].response");
    print OUT "404";
    close(OUT);
    exit(0);
}

This is only a patch, it is not a solution. I believe that the final solution
must be like the one adopted for the filesystem indexing. Thus I don't
propose to add these lines to the standard swishspider (as I do in the
last mail about the language in swishspider).
But if someone has the same problem can use these lines and set the SWISH_POS
environment variable to define the pattern that the URL must satisfy and
the variable SWISH_NEG to define the pattern that the URL must *not* satisfy.
For example if the variables are set as follows

SWISH_POS -> http://medialab.di.unipi.it/
SWISH_NEG -> http://medialab.di.unipi.it/.*?/.+

the spider retrives only the files contained in the root directory
from medialab's site.

-- Antonio
Received on Wed May 26 10:48:24 1999