Re: PDF Indexation

From: Bill Moseley <moseley(at)>
Date: Thu Nov 14 2002 - 00:23:39 GMT
At 11:10 AM 11/13/02 -0800, David THOMAS wrote:
>When I try to index a PDF :
>Skipping http://localhost/pdf/test.pdf:  Wrong content type: 
>although I have the FileFilter directive configured:
>FileFilter .pdf      c:\path\xpdf\pdftotext   "'%p' -"
>and this is a real PDF file.

Use either the -S prog and method of spidering or use the
SWISH::Filter module with -S http.  Just to confuse things more, with -S
prog and you can either use SWISH::Filter or the pdf2html module
to convert the pdf files.  Both are given as examples in the
prog-bin/ file.

FileFilter should work with -S http.  I'll see if I can't get a patch.

Bill Moseley
