Skip to main content.
home | support | download

Back to List Archive

Re: advantages and disadvantages of indexing via the spider

From: Eric Lease Morgan <emorgan(at)not-real.nd.edu>
Date: Tue Feb 17 2004 - 15:25:05 GMT
On Feb 17, 2004, at 8:54 AM, Bill Moseley wrote:

> Hey, that's the unix way -- specific tools for doing specific tasks.
> I think creating a mirror with wget is a fine idea.  IIRC, wget can
> modify paths if any of the URLs contain query parameters.  So that 
> would
> be a problem.  But if it's just static content then it should work 
> fine.
> Wget will only update modified files if you use time-stamping --
> assuming the source provides the dates.
>
>> What are the advantages and disadvantages of either approach? If I use
>> the spider, the I don't need nearly as much local disk space. If I do
>> the mirroring thing, then I have local copies and I save on network
>> bandwidth.
>
> Ah, what's a little disk space?  What you save is indexing time.  Run 
> the
> spider in the background or from cron separately to keep the mirror up
> to date and then index locally from another cron job.

Thank you one an all for the feedback. Y'all confirmed my assumptions. 
If I use spider.pl, then I can save on disk space. If it use mirroring, 
then I can save on time and network bandwidth. I think I'll go with the 
mirroring approach. 'More later.

-- 
Eric Lease Morgan
University Libraries of Notre Dame

(574) 631-8604
Received on Tue Feb 17 07:25:10 2004