I had posted this same question a few weeks ago, and I'm about ready to give
up and just deal with the fact it's indexing more than I'd like. I've got my
site at winnefox.org/wals/new Problem is, it indexes everything in
winnefox.org (which is an entirely different department) as well as
everything in the wals folder. I've spent hours looking over the help
section at the bottom of spider.pl for all the references to call-back and
test_url and I'm just not getting it. Are there any example's anywhere that
explain this more plainly/ step-by-step? Kind of like a dummies guide?
Thank you in advance for any help you can provide.
> -----Original Message-----
> From: Bill Moseley [mailto:firstname.lastname@example.org]
> At 10:51 PM 06/29/02 -0700, Sutherland, Paul wrote:
> >I want to only index certain directories on a server.
> >e.g. http://something.com/foo & http://somthing.com/bar but
> not the rest of
> >the site
> This is reasonably easy with 2.1-dev and the spider.pl file
> if you know a
> little perl.
> When using the spider.pl program you can define a call-back
> function that
> gets called for every URL extracted from the page. If the call-back
> function returns false then that URL will not be added to the
> list of URLs
> to spider.
> >From the top level directory of the swish-e distribution:
> perldoc prog-bin/spider.pl
> and search for "test_url".
Received on Wed Jul 3 18:23:43 2002