Skip to main content.
home | support | download

Back to List Archive

Re: [SWISH-E:46] Re: Max (reasonable) size of index?

From: Richard Johnsson <johnsson(at)>
Date: Sat Oct 25 1997 - 03:39:19 GMT
I gave up on indexing 40000 files because it was running up against the
ulimit of 128MB dataspace after 3 hours of indexing (BSDI on PPro 200). I
tried indexing subparts and merging with the same result. I switched to
Excite's EWS which manages the same task in less than 80MB and in 30
minutes if you don't enable "quality" summaries.

At 08:16 PM 10/24/97 -0700, Roy Tennant wrote:
>Yes, memory is an issue, as is time. Some of our indexes take hours to 
>complete (hey, isn't that why it gets dark at night?). I'm not sure of 
>the amount of RAM usage, but it must be considerable. One machine we are 
>using has 1 GB of RAM, so if we had a lot less I'm sure it would be 
>something to watch more closely.
>On Fri, 24 Oct 1997, WWW server manager wrote:
>> Roy Tennant wrote:
>> > On Fri, 24 Oct 1997, Michael A. Tilp wrote:
>> > > 
>> > > 	Anyone have any guesses as to the maximum size of a SWISH index? I've
>> > > seen it used on sites of 5000+ pages (the old version); I'm just
>> > > how far that could go. Call me a bit afeared of beginning a large
index and
>> > > then watching it choke a few months down the line ;)
>> > 
>> > Our largest index here so far is in the 15 MB range, and probably in the 
>> > neighborhood of 20,000 files. When we are through with our transition we 
>> > will be creating indexes over 20 MB in size. So far no problems. This is 
>> > on a DEC Alpha and a Sun SPARCCenter.
>> One thing to watch, though, is memory use during indexing - building an 8MB
>> index from around 6500 documents here takes about 32MB of memory at its
>> (and 17 minutes on a SPARC 10/51, an "old" and hence slow system by current
>> standards). If you don't have enough memory to avoid it (and/or the rest of
>> the system) being slowed by paging/swapping, whether or not it is
>> to build large indexes may be only half the story!
>>                                 John Line
>> -- 
>> University of Cambridge WWW manager account (usually John Line)
>> Send general WWW-related enquiries to
Received on Fri Oct 24 20:47:14 1997