Skip to main content.
home | support | download

Back to List Archive

Re: PDF Indexation

From: Bill Moseley <moseley(at)not-real.hank.org>
Date: Thu Nov 14 2002 - 00:23:39 GMT
At 11:10 AM 11/13/02 -0800, David THOMAS wrote:
>When I try to index a PDF :
>Skipping http://localhost/pdf/test.pdf:  Wrong content type: 
>application/pdf.
>
>although I have the FileFilter directive configured:
>FileFilter .pdf      c:\path\xpdf\pdftotext   "'%p' -"
>and this is a real PDF file.

Use either the -S prog and spider.pl method of spidering or use the
SWISH::Filter module with -S http.  Just to confuse things more, with -S
prog and spider.pl you can either use SWISH::Filter or the pdf2html module
to convert the pdf files.  Both are given as examples in the
prog-bin/SwishSpiderConfig.pl file.

FileFilter should work with -S http.  I'll see if I can't get a patch.


-- 
Bill Moseley
mailto:moseley@hank.org
Received on Thu Nov 14 00:24:08 2002