Skip to main content.
home | support | download

Back to List Archive

Re: Parsing doc, xls and excel files with swish-e and libxml2

From: David L Norris <dave(at)>
Date: Tue Jun 28 2005 - 06:21:54 GMT
On Tue, 2005-06-28 at 11:18 +0530, Animesh Bansriyar wrote:
> I cannot ask all users to have perl on their systems as well.

I'm not sure why you think you need Perl.  Perl is not required for

> What are the chances of adding in a native parser for all document formats 
> onto swish-e itself? I would love to contribute if this is possible and feasible.

"Native" filters are installed by the Swish-e Windows installer for Word
(catdoc) and PDF (pdftotext) documents.  You can use catdoc, wvware,
xpdf, or any other program that converts a document to Text, HTML, or
XML with a FileFilter directive during indexing:

For Word documents your config file might look like this:
  FileFilter .doc catdoc.exe '-s8859-1 -d8859-1 "%p"'

Catdoc and pdftotext are included with the Swish-e for Windows builds.
You can place additional filter programs into the lib\swish-e directory
of your installation.  So if you install to c:\swish-e then you would
place additional filters in the c:\swish-e\lib\swish-e\ directory.

 David Norris
  ICQ - 412039
Received on Mon Jun 27 23:21:55 2005