On Feb 17, 2004, at 8:54 AM, Bill Moseley wrote:
> Hey, that's the unix way -- specific tools for doing specific tasks.
> I think creating a mirror with wget is a fine idea. IIRC, wget can
> modify paths if any of the URLs contain query parameters. So that
> would
> be a problem. But if it's just static content then it should work
> fine.
> Wget will only update modified files if you use time-stamping --
> assuming the source provides the dates.
>
>> What are the advantages and disadvantages of either approach? If I use
>> the spider, the I don't need nearly as much local disk space. If I do
>> the mirroring thing, then I have local copies and I save on network
>> bandwidth.
>
> Ah, what's a little disk space? What you save is indexing time. Run
> the
> spider in the background or from cron separately to keep the mirror up
> to date and then index locally from another cron job.
Thank you one an all for the feedback. Y'all confirmed my assumptions.
If I use spider.pl, then I can save on disk space. If it use mirroring,
then I can save on time and network bandwidth. I think I'll go with the
mirroring approach. 'More later.
--
Eric Lease Morgan
University Libraries of Notre Dame
(574) 631-8604
Received on Tue Feb 17 07:25:10 2004