Indexing takes forever

From: Nick <newsgroups(at)>
Date: Fri May 06 2005 - 19:55:21 GMT
I currently have swish-e 2.4.3 up and working.  It appears to be working
fine (with a small set of files) but indexing all my files is taking a
really long time.

It has been almost a day since I started indexing and it has not finished
yet.  The files themselves are on a win 2003 file server which I have
mounted on my gentoo linux box via cifs.  The swish-e linux box is a
1.7GHz P4 with 512MB ram.  swish-e doesn't use much CPU (about 1%), but it
is using about 80% of the ram.  Also it doesn't look like swish-e is
creating any temp files.  The network connection on the linux box is only
using about 1.5Mb/s of a 100Mb/s connection.  Also the win 2003 box does
not have any resource problems, so the slowdown should not be it.

Here is the breakdown of how many files I have of each type:

11,322	.doc
137	.txt
8,536	.xls
2,026	.ppt
1,575	.pdf
1,129	.htm
25	.html

I am somewhat confused at the best (for speed) way to setup indexing.  I
have read through all the docs (or at least I think I did), and I am still
somewhat confused at the best way to setup the filters.  In some places it
seems to say I don't need to configure anything specifically to get the
extra ms word/excel/powerpoint functionality, and in others I get the
impression I am supposed to actively configure something for each file
type.  I have installed all the programs I am supposed to for pdf, word,
excel, and powerpoint from what I read.

I have tested searching in ms word and powerpoint docs, and it works.

Here is my /etc/swish.conf config file:

IndexDir "/home/shared"
IndexOnly .htm .html .pdf .txt .doc .xls .ppt
TmpDir /var/tmp
IndexFile /var/swish/site.index
ReplaceRules remove /home/shared
MetaNames swishdocpath swishtitle

When I index I am just using the line:

swish-e -c /etc/swish.conf

I want indexing to go as fast as possible.  I realize I have a fair amount
of files, but I would like to re-index every night and if it takes over a
day to index then that may not be possible.  This test setup is really
just to prove it will work, if it does work well then I can probably get a
new server built for it with a lot more cpu/ram.
