Skip to main content.
home | support | download

Back to List Archive

Re: Big Index works!

From: Bill Moseley <moseley(at)>
Date: Wed Jun 12 2002 - 14:36:36 GMT
At 02:53 AM 6/12/2002 -0700, Cristiano Corsani wrote:

>Hi all,
>I wite just to tell that swish-e works with my big DB.


>1111343 files indexed.  2026462918 total bytes.  92327081 total words.
>Elapsed time: 16:27:15 CPU time: 16:27:15

16 hours!   

Average sized of doc is about 1,823 bytes.

>on a Pentium IV with 250Mb RAM.

On my machine I my Athlon 1800+ with 1/2G I can index about 24,000 files in
a minute.  Less than an hour for a million.  On my PIII-550 it takes about
4 minutes.  So that's about 3 hours to do a million files.

My guess is you are running out of memory while indexing.  Did you index
with the -e switch?  It will keep your disk drive busy, but will save RAM.
Better to let swish swap than the OS.  Best to use a machine with more RAM.

How does one monitor memory usage on Windows?

So it says: 2,778,708 unique words indexed.

That's a lot of words to index.  Will people be searching all those words?
Trim that number down and you will save memory.

Make sure you are *not* indexing a unique record identifier.  No point
indexing something you can use to look up the item directly in a database.

Run swish-e -T index_words_only > word_list  and then you can look at the
words indexed.  You may see words that do not need to be index.

Also, you might search the archive using a Subject Only search for "multi
millions words" and also search for BIGHASHSIZE to look at possible tuning
you might be able to do.

Hope this helps.

Bill Moseley
Received on Wed Jun 12 14:40:14 2002