Skip to main content.
home | support | download

Back to List Archive

Re: Indexing takes forever

From: Peter Karman <peter(at)>
Date: Fri May 06 2005 - 21:05:58 GMT
Nick scribbled on 5/6/05 3:49 PM:
> swish-e -c /etc/swish.conf -S prog -i
> I tried that but I got this:
> Indexing Data Source: "External-Program"
> Indexing ""
> External Program found: /usr/lib/swish-e/
> Must supply at least one directory
> Usage:
> [options] directory <directory...> | swish-e -S prog -i stdin
>       Options:
>         -verbose        Display processing info
>         -debug          Enable debugging (including SWISH::Filter debugging)
>         -man            Display documentation
>         -path           Display location lib path set at installation
>         -no_skip        Process documents even if filtering fails
>         -symlinks       Follow symbolic links.  Default is to NOT follow
> symlinks
> Removing very common words...
> no words removed.
> Writing main index...
> err: No unique words indexed!

try adding this line to your existing config:

SwishProgParameters /home/shared

and comment out this line:

# IndexDir "/home/shared"

> Is there any reason to use SWISH::Filter for performance, or is it just
> supposed to be easier?  To me doing something like this in the config file
> makes more sense, as I understand what it is doing when I tell it about
> each type of file:

I think you're right, in principle. You must be a sysadmin-type: we tend not to 
like the black box approach. ;)

SWISH::Filter lets you drop in new filters and, in theory, not change your 
config. But doing it longhand like you have it should work too. Unless it doesn't...

> IndexContents TXT* .txt
> IndexContents HTML* .htm
> IndexContents HTML* .html
> FileFilter .pdf pdftotext "'%p' -"
> IndexContents TXT* .pdf
> FileFilter .doc catdoc
> IndexContents TXT* .doc
> FileFilter .ppt ppthtml
> IndexContents TXT* .ppt
> But of course I have something wrong in there since I am getting lots of
> errors from catdoc, and also I don't know how to put the excel one in
> there since I think it is a perl script.

Peter Karman  .  .  peter(at)
Received on Fri May 6 14:05:59 2005