Skip to main content.
home | support | download

Back to List Archive

Re: advantages and disadvantages of indexing via the spider

From: Aaron Bazar <aaronb(at)not-real.spamcop.net>
Date: Mon Feb 16 2004 - 19:21:03 GMT
I also find the "callback" functionality to be particularly useful in
the spider.pl script. I use it to specifically ignore certain links on
the remote server and only download what I want. It is really quite
versatile.

Aaron Bazar
http://www.worldwidewebfind.com


-----Original Message-----
From: swish-e@sunsite.berkeley.edu
[mailto:swish-e@sunsite.berkeley.edu]On Behalf Of Greg Fenton
Sent: Monday, February 16, 2004 2:06 PM
To: Multiple recipients of list
Subject: [SWISH-E] Re: advantages and disadvantages of indexing via the
spider


--- Eric Lease Morgan <emorgan@nd.edu> wrote:
>
> What are the advantages and disadvantages of indexing via the the
> spider?
>

Since you are talking about a "remote site", then as you said you
either have to use spider.pl or some other crawler to get the pages.

Ignoring the features of one crawler over another, the upside of
spider.pl is the lower disk requirements and the guarantee of "fresh"
data.  The downside is, in the event of needing to rebuild the
database, indexing will be slower than indexing a pre-crawled local
disk cache.

We use spider.pl for our *local* site because we have dynamic content
(e.g. Server Side Includes), so filesystem crawls wouldn't be accurate
or would involve more coding on our part.  Since we have an internal
staging server, we don't impact the production site should we need to
rebuild the database a few times a day.

Hope this helps,
greg_fenton.

=====
Greg Fenton
greg_fenton@yahoo.com

__________________________________
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
http://taxes.yahoo.com/filing.html
Received on Mon Feb 16 11:21:03 2004