On Tue, 15 Apr 2003, Jody Cleveland wrote:
> My question is, how do I get the spider to only look at a specific folder,
> and nothing else? I looked through the swish-e message archive, and came
> across this, which I added to my SwishSpiderConfig.pl:
>
>
> But, that still indexes all of www.oshkoshpubliclibrary.org. All I want is
> the citydirs directory.
You can try setting
debug => DEBUG_SKIPPED|DEBUG_INFO,
And if that's not enough simply add some print statements to your test_url
function.
test_url => sub {
my ($uri, $server) = @_;
print STDERR "checking path: ", $uri->path, \n"
if $server->{debug}&DEBUG_INFO
return if $uri->path =~ /\.(gif|jpeg)$/;
return $uri->path =~ m[^/citydirs/];
},
Another way to do all this is index the entire site in one go and use
Swish-e's ExtractPath to set a metaname. Then when searching you can
limit to areas of the index. See the "select_by_meta" example in the
swish.cgi file.
BTW -- are you using keep_alive => 1 when spidering?
--
Bill Moseley moseley@hank.org
Received on Tue Apr 15 13:31:15 2003