I currently have swish-e 2.4.3 up and working. It appears to be working
fine (with a small set of files) but indexing all my files is taking a
really long time.
It has been almost a day since I started indexing and it has not finished
yet. The files themselves are on a win 2003 file server which I have
mounted on my gentoo linux box via cifs. The swish-e linux box is a
1.7GHz P4 with 512MB ram. swish-e doesn't use much CPU (about 1%), but it
is using about 80% of the ram. Also it doesn't look like swish-e is
creating any temp files. The network connection on the linux box is only
using about 1.5Mb/s of a 100Mb/s connection. Also the win 2003 box does
not have any resource problems, so the slowdown should not be it.
Here is the breakdown of how many files I have of each type:
I am somewhat confused at the best (for speed) way to setup indexing. I
have read through all the docs (or at least I think I did), and I am still
somewhat confused at the best way to setup the filters. In some places it
seems to say I don't need to configure anything specifically to get the
extra ms word/excel/powerpoint functionality, and in others I get the
impression I am supposed to actively configure something for each file
type. I have installed all the programs I am supposed to for pdf, word,
excel, and powerpoint from what I read.
I have tested searching in ms word and powerpoint docs, and it works.
Here is my /etc/swish.conf config file:
IndexOnly .htm .html .pdf .txt .doc .xls .ppt
ReplaceRules remove /home/shared
MetaNames swishdocpath swishtitle
When I index I am just using the line:
swish-e -c /etc/swish.conf
I want indexing to go as fast as possible. I realize I have a fair amount
of files, but I would like to re-index every night and if it takes over a
day to index then that may not be possible. This test setup is really
just to prove it will work, if it does work well then I can probably get a
new server built for it with a lot more cpu/ram.
Received on Fri May 6 12:55:23 2005