Hello folks-
I am working on a project to index all of our mailing list archives.
Currently we have over 600,000 documents to be indexed so they can be
searched. I got swish-e 2.4.2 installed and running. I am searching
using the included spider.pl application so it doesn't affect resources
on the production machine as much.
I began the indexing approximately 72 hours ago, and it hasn't ended
yet. It is running on a G3 450Mhz machine with 576Mb of RAM. I can
see swish-e hitting my webserver, and the .temp database seems to
continue to grow. I ran the indexer with the following command:
./bin/swish-e -S prog -c swish.conf.
So, I have the following questions:
1. I expect to have over 1,000,000 documents in our archives as things
progress. Is this pushing the limits of swish-e?
2. I have seen the indexer hit my robots.txt multiple times, is there a
way to check on the progress to see if/when it will finish indexing?
3. What should I do regarding the current index process? I'm afraid to
stop it, because I don't want to have to start the indexing all over
again.
4. Do you have any recommendations on what I can do to improve this
process?
Any help would be greatly appreciated.
-=Aaron
Administrator, lists.apple.com
Received on Thu Sep 30 08:54:10 2004