Skip to main content.
home | support | download

Back to List Archive

Re: [swish-e] Swish-e not indexing doc or PDF files

From: Peter Karman <peter(at)>
Date: Thu Feb 28 2008 - 14:53:53 GMT
On 02/27/2008 11:10 PM, Liam Buchanan wrote:
> I am using xpdf but I don't think that is the issue.
> I tried a test of a html page with a link (direct file link) to the pdf
> document and the spider was able to index it correctly and write to a
> html document.

that tells us pdftotext (part of xpdf that SWISH::Filter uses) works alright with your pdf.

> When I attempt to specify a url as the link to the pdf, the message is
> 'can't open file' even though the file is accessible through a browser
> via the same url.
> One thing I did notice is when instead of including the domain in the
> url link, I included the IP address - the mime type reference in the cmd
> output stated '???' Instead of the usual 'application/pdf'.

then I expect it's your config that's wrong.

is your domain a vhost on the server? then the IP likely isn't what you want. And unless
you configure the spider to recognize the IP as being the same base url as the domain
name, then you will likely hit snags there too.

Peter Karman  .  peter(at)  .

Users mailing list
Received on Thu Feb 28 09:53:53 2008