I also find the "callback" functionality to be particularly useful in
the spider.pl script. I use it to specifically ignore certain links on
the remote server and only download what I want. It is really quite
[mailto:email@example.com]On Behalf Of Greg Fenton
Sent: Monday, February 16, 2004 2:06 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: advantages and disadvantages of indexing via the
--- Eric Lease Morgan <firstname.lastname@example.org> wrote:
> What are the advantages and disadvantages of indexing via the the
Since you are talking about a "remote site", then as you said you
either have to use spider.pl or some other crawler to get the pages.
Ignoring the features of one crawler over another, the upside of
spider.pl is the lower disk requirements and the guarantee of "fresh"
data. The downside is, in the event of needing to rebuild the
database, indexing will be slower than indexing a pre-crawled local
We use spider.pl for our *local* site because we have dynamic content
(e.g. Server Side Includes), so filesystem crawls wouldn't be accurate
or would involve more coding on our part. Since we have an internal
staging server, we don't impact the production site should we need to
rebuild the database a few times a day.
Hope this helps,
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
Received on Mon Feb 16 11:21:03 2004