Indexing non HTML files... (PDF, DOC, ...)

From: Rainer Scherg <Rainer.Scherg(at)>
Date: Fri May 07 1999 - 17:34:02 GMT

In August last year I wrote a message in this eMail-list 
that Ive done some enhancements which enable swish (1.1) to index
non-HTML files like PDF or other documents types (filter option).

Since then I got occasionally requests how to do this and where to
get the source. Due to the requests I'm adapting the small enhancements
to swish-e 1.3.2.

If there is a public interest, I would try to get a small webspace
to provide the source - instead of sending it via email on each request.

To describe the changes to swhis in short:
new config directives:
     FilterDir   <path-to-filter-progs>
     FileFilter  <file-ext> <filterprog>

     FilterDir   /usr/local/etc/httpd/sbin/filters
     FileFilter  .pdf
     FileFilter  .doc
     FileFilter  .ps
     FileFilter  .gz

e.g. - script:
# Convert file in arg1 to txt on stdout
/usr/local/bin/pdftotext "$1" - 2>/dev/null

Regards Rainer
