Skip to main content.
home | support | download

Back to List Archive

Re: Indexing takes forever

From: Nick <newsgroups(at)>
Date: Fri May 06 2005 - 21:13:45 GMT
I tried like you said and now I am getting some of these:

22865 Warning - /home/shared/Accounting/Capital/Update Capital 7-7-04.xls:
Character in 'c' format wrapped in pack at
/usr/lib/perl5/vendor_perl/5.8.6/Spreadsheet/ line 1790.
Error: Bad annotation action
Failed to set content type for document
Examiner Playground Article 12-13-02.mht'
Bad BBD entry!
Broken OLE file. Try using -b switchFailed to set content type for

Do those matter?

Also does the default SWISH::Filter install know about powerpoint files
too?  I looked in /usr/lib/swish-e/perl/SWISH/Filters but I only see files
that seem to reference ms word, ms excel, pdf, and mp3.  I see that ms
powerpoint is advertised on your web page as being supported, but there
doesn't seem to be much mention of it.

> Nick scribbled on 5/6/05 3:49 PM:
>> swish-e -c /etc/swish.conf -S prog -i
>> I tried that but I got this:
>> Indexing Data Source: "External-Program"
>> Indexing ""
>> External Program found: /usr/lib/swish-e/
>> Must supply at least one directory
>> Usage:
>> [options] directory <directory...> | swish-e -S prog -i
>> stdin
>>       Options:
>>         -verbose        Display processing info
>>         -debug          Enable debugging (including SWISH::Filter
>> debugging)
>>         -man            Display documentation
>>         -path           Display location lib path set at installation
>>         -no_skip        Process documents even if filtering fails
>>         -symlinks       Follow symbolic links.  Default is to NOT follow
>> symlinks
>> Removing very common words...
>> no words removed.
>> Writing main index...
>> err: No unique words indexed!
> try adding this line to your existing config:
> SwishProgParameters /home/shared
> and comment out this line:
> # IndexDir "/home/shared"
>> Is there any reason to use SWISH::Filter for performance, or is it just
>> supposed to be easier?  To me doing something like this in the config
>> file
>> makes more sense, as I understand what it is doing when I tell it about
>> each type of file:
> I think you're right, in principle. You must be a sysadmin-type: we tend
> not to
> like the black box approach. ;)
> SWISH::Filter lets you drop in new filters and, in theory, not change your
> config. But doing it longhand like you have it should work too. Unless it
> doesn't...
>> IndexContents TXT* .txt
>> IndexContents HTML* .htm
>> IndexContents HTML* .html
>> FileFilter .pdf pdftotext "'%p' -"
>> IndexContents TXT* .pdf
>> FileFilter .doc catdoc
>> IndexContents TXT* .doc
>> FileFilter .ppt ppthtml
>> IndexContents TXT* .ppt
>> But of course I have something wrong in there since I am getting lots of
>> errors from catdoc, and also I don't know how to put the excel one in
>> there since I think it is a perl script.
> --
> Peter Karman  .  .  peter(at)
Received on Fri May 6 14:13:45 2005