Wonderful, thanks -- I'll give it a try!
Dave V.
-----Original Message-----
From: swish-e@sunsite3.berkeley.edu
[mailto:swish-e@sunsite3.berkeley.edu]On Behalf Of David Wood
Sent: Wednesday, June 23, 2004 10:05 AM
To: Multiple recipients of list
Subject: [SWISH-E] Re: Spider, but not index?
In your spider config file, put something like this:
@servers = (
{
...
test_response => \&test_response,
...
}
);
sub test_response {
@SNUBBED_URLS = (
"/index/index.htm",
"/mainpage.new/pwebbrief.html",
"/mainpage.new/pweb_faq.htm",
"/products/products_nojs.htm",
"/sitemap/map.htm",
"/toolkit/salesmkt_toolkit.htm",
);
my $uri = $_[0];
my $server = $_[1];
my $url = "";
# These URLs should be spidered, but not indexed, as they're too
generic.
foreach $url (@SNUBBED_URLS) {
$server->{no_index} = 1 if ($uri->path =~ /$url$/);
}
return(1);
}
Cheers,
David
At 15:40 Wednesday 23-6-2004, David VanHook wrote:
>Is there a relatively easy way to get SWISH-E to spider a page (i.e., to
>follow all of the links on it), but to not index the contents of that same
>page? I've tried using FileRules title in the config file, but am having
no
>luck -- I get a Bad Directive error, even when I paste in the code directly
>from the online docs.
>
>Thanks!
>
>Dave VanHook
>dvanhook@mshanken.com
*********************************************************************
Due to deletion of content types excluded from this list by policy,
this multipart message was reduced to a single part, and from there
to a plain text message.
*********************************************************************
Received on Wed Jun 23 14:39:33 2004