As far as docs go, I would like to see a few different sample swish.conf
files (and possibly related command line options like you showed below)
for different applications. Generally when I am setting up something I
like to see example setups/configs and play around with it before trying
to fine-tune it. If there were more example configs then a user could
just pick one that is close to what they are looking for to get it going,
then work from that.
On the same note what should I put in the config file if I use the:
swish-e -c /etc/swish.conf -S prog -i DirTree.pl
as you said below. I need to be able to search ms word, excel,
powerpoint, pdf, html, and text.
The doc files especially change very often so I probably wouldn't want to
cache those, but since I have mostly doc files I probably won't bother
caching anything at this point.
I have xpdf, catdoc, ppthtml, the excel perl stuff installed on the linux
I am guessing that it was just using the default html filter to find text
in the doc and ppt files that I searched then? I know that it could find
text in these binary files using my existing config, that is why I thought
it was somehow finding the extra progs I had installed to filter the file
> Nick scribbled on 5/6/05 2:54 PM:
>> I currently have swish-e 2.4.3 up and working. It appears to be working
>> fine (with a small set of files) but indexing all my files is taking a
>> really long time.
> you're right. should not be taking that long.
>> I am somewhat confused at the best (for speed) way to setup indexing. I
>> have read through all the docs (or at least I think I did), and I am
>> somewhat confused at the best way to setup the filters.
> as luck has it, I spent the morning working on the docs. So at least I
> have it
> fresh in my head (which may not mean much).
> swish-e does not know about non-text files like .pdf, .doc, .xls and .ppt.
> need some 3rd party programs to convert those to text so that swish-e can
> them. For the windows distrib of swish-e, some of those 3rd party apps are
> bundled in: xpdf and catdoc (see the note here:
> http://swish-e.org/download/index.html). Since you're using Linux and
> the windows volume remotely, you need to install the 3rd party apps for
> Linux. I
> think the filters/README file talks about that (I haven't gotten to that
> revision yet...).
> You're also calling swish-e with the default -S fs method (since you don't
> specify one explicitly). You probably want -S prog, in order to get your
> filtered with the 3rd party apps.
> A few things I would try:
> 1. make sure the SWISH::Filter class is in your Perl include path:
> % export PERL5LIB=/usr/local/lib/swish-e # bash, bourne shells
> % setenv PERL5LIB /usr/local/lib/swish-e # csh, tcsh
> 2. index with this command instead:
> swish-e -c /etc/swish.conf -S prog -i DirTree.pl
> 3. if you're going to index every night, but the binary docs (pdf, .doc,
> don't change that often, consider caching the filtered output. The
> causes the most overhead: a new forked process for each doc.
> you can cache output with the DirTree.pl script, or roll your own.
> 4. like I mentioned, I'm working on the docs even now, so if there are
> ways you think that they could be improved, post back to the list.
> Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Fri May 6 13:28:56 2005