Skip to main content.
home | support | download

Back to List Archive

swish-e on a large scale

From: Aaron Levitt <alevitt(at)not-real.apple.com>
Date: Thu Sep 30 2004 - 15:53:54 GMT
Hello folks-

I am working on a project to index all of our mailing list archives.  
Currently we have over 600,000 documents to be indexed so they can be 
searched.  I got swish-e 2.4.2 installed and running.  I am searching 
using the included spider.pl application so it doesn't affect resources 
on the production machine as much.

I began the indexing approximately 72 hours ago, and it hasn't ended 
yet.  It is running on a G3 450Mhz machine with  576Mb of RAM.  I can 
see swish-e hitting my webserver, and the .temp database seems to 
continue to grow.  I ran the indexer with the following command: 
./bin/swish-e -S prog -c swish.conf.

So, I have the following questions:

1. I expect to have over 1,000,000 documents in our archives as things 
progress.  Is this pushing the limits of swish-e?

2. I have seen the indexer hit my robots.txt multiple times, is there a 
way to check on the progress to see if/when it will finish indexing?

3. What should I do regarding the current index process?  I'm afraid to 
stop it, because I don't want to have to start the indexing all over 
again.

4. Do you have any recommendations on what I can do to improve this 
process?

Any help would be greatly appreciated.

-=Aaron
Administrator, lists.apple.com
Received on Thu Sep 30 08:54:10 2004