swish-e -c /etc/swish.conf -S prog -i DirTree.pl
I tried that but I got this:
Indexing Data Source: "External-Program"
External Program found: /usr/lib/swish-e/DirTree.pl
Must supply at least one directory
DirTree.pl [options] directory <directory...> | swish-e -S prog -i stdin
-verbose Display processing info
-debug Enable debugging (including SWISH::Filter debugging)
-man Display documentation
-path Display location lib path set at installation
-no_skip Process documents even if filtering fails
-symlinks Follow symbolic links. Default is to NOT follow
Removing very common words...
no words removed.
Writing main index...
err: No unique words indexed!
Is there any reason to use SWISH::Filter for performance, or is it just
supposed to be easier? To me doing something like this in the config file
makes more sense, as I understand what it is doing when I tell it about
each type of file:
IndexContents TXT* .txt
IndexContents HTML* .htm
IndexContents HTML* .html
FileFilter .pdf pdftotext "'%p' -"
IndexContents TXT* .pdf
FileFilter .doc catdoc
IndexContents TXT* .doc
FileFilter .ppt ppthtml
IndexContents TXT* .ppt
But of course I have something wrong in there since I am getting lots of
errors from catdoc, and also I don't know how to put the excel one in
there since I think it is a perl script.
> Nick scribbled on 5/6/05 3:28 PM:
>> As far as docs go, I would like to see a few different sample swish.conf
>> files (and possibly related command line options like you showed below)
>> for different applications. Generally when I am setting up something I
>> like to see example setups/configs and play around with it before trying
>> to fine-tune it. If there were more example configs then a user could
>> just pick one that is close to what they are looking for to get it
>> then work from that.
> there should be example config docs installed by default in
> check /usr/local/share/doc/swish-e/examples/conf/ if you installed in
>> On the same note what should I put in the config file if I use the:
>> swish-e -c /etc/swish.conf -S prog -i DirTree.pl
> that command should work with your existing config file (I think).
> will try and load SWISH::Filter for file formats it recognizes.
>> I am guessing that it was just using the default html filter to find
>> in the doc and ppt files that I searched then? I know that it could
>> text in these binary files using my existing config, that is why I
>> it was somehow finding the extra progs I had installed to filter the
> yes, I have been misled that way too. swish-e does its best to get
> whatever text
> it finds, and since word .doc (especially) files have real text mixed in
> all the proprietary formatting instructions, swish-e probably got lots of
> of text. but a proper filter will ensure you get all of it, as the author
> intended it.
> Peter Karman . http://peknet.com/ . peter(at)not-real.peknet.com
Received on Fri May 6 13:50:01 2005